Anthropic’s Claude Opus 4.5 Achieves Record 4-Hour, 49-Minute Task Success Rate

Anthropic’s Claude Opus 4.5 sets a record with a 50% task success rate lasting 4 hours and 49 minutes, highlighting both its strengths and performance gaps.

Staff

Published

21 December, 2025

AI research organization METR has unveiled benchmark results for Claude Opus 4.5, a model developed by Anthropic. This latest iteration has achieved a remarkable 50 percent time horizon of approximately 4 hours and 49 minutes, marking the highest score ever recorded for such a metric. The time horizon indicates how long a task can last while still being effectively addressed by an AI model at a specified success rate, which in this instance is 50 percent.

The results highlight a significant disparity in performance across different success rates. At an 80 percent success rate, the time horizon diminishes sharply to only 27 minutes, a duration comparable to that of older models. This suggests that while Opus 4.5 excels in longer task scenarios, its effectiveness at higher success rates may not represent a substantial leap forward. Although METR cites a theoretical upper limit exceeding 20 hours, this figure is likely influenced by limited test data, according to the organization.

Despite its groundbreaking achievements, the benchmark conducted by METR has its limitations, mainly due to the narrow scope of its assessment, which examined only 14 samples. A thorough analysis of the model’s weaknesses was conducted by Shashwat Goel, providing further insights into its performance and areas for potential improvement.

As AI continues to play an increasingly critical role across various sectors, advancements like those demonstrated by Claude Opus 4.5 emphasize the ongoing competition among technology firms to refine their AI capabilities. The ability to handle longer tasks effectively can have significant implications for industries reliant on AI-driven solutions, from customer service to complex data analysis.

Anthropic’s achievement with Claude Opus 4.5 may also stimulate further research and development within the AI landscape, prompting other organizations to enhance their models and benchmarks. As the demand for sophisticated AI technologies grows, understanding performance metrics will be essential for both developers and users in making informed decisions about deploying these systems.

The release of these results comes at a time when AI technology is under scrutiny for its capabilities and ethical implications. As organizations strive to ensure that AI operates effectively and responsibly, continued advancements will likely necessitate a balance between performance enhancements and the ethical considerations surrounding AI applications.

In summary, while Claude Opus 4.5’s record-breaking time horizon underscores its potential for tackling longer tasks, the varying performance at different success rates reveals areas that warrant further exploration. The field of AI remains dynamic, and advancements like these will undoubtedly shape the future of technology in myriad ways.

AI Cybersecurity

Anthropic’s Mythos Reveals Thousands of Vulnerabilities, Banks Prepare for AI Cyberattacks

Anthropic's Mythos exposes thousands of critical vulnerabilities in major systems, prompting $100M in defensive action from tech giants and U.S. banks.

Rachel Torres3 May, 2026

AI Business

Jensen Huang Critiques AI Doom Predictions, Calls for Fact-Based Discussions

Nvidia CEO Jensen Huang urges industry leaders to avoid alarmist claims about AI's future, citing concerns over inaccurate predictions like a 50% job displacement...

Marcus Chen2 May, 2026

AI Government

Anthropic Accuses Moonshot AI of 3.4M Unauthorized Claude Exchanges Amid US State Response

Anthropic accuses Moonshot AI of 3.4M unauthorized exchanges with its Claude chatbot, prompting a global U.S. State Department campaign against IP theft.

Staff2 May, 2026

AI Cybersecurity

Anthropic Launches Beta of Claude Security AI Tools to Combat Cyber Threats

Anthropic unveils Claude Security’s public beta, leveraging AI to automate vulnerability scanning and patch generation, poised to enhance enterprise cybersecurity.

Rachel Torres2 May, 2026

DeepSeek Launches V4 Open-Source Model, Underpricing GPT-5.5 and Claude Opus 4.7

DeepSeek's V4 open-source model undercuts GPT-5.5 and Claude Opus 4.7 with costs of $1.74 per million tokens, promising a disruptive shift in AI pricing...

Staff2 May, 2026

AI Cybersecurity

Anthropic Launches Claude Security for AI Vulnerability Scanning in Public Beta

Anthropic unveils Claude Security, a cutting-edge AI tool for vulnerability scanning, enabling immediate scans without API integration for its enterprise customers.

Rachel Torres2 May, 2026

AI Technology

Amazon and Anthropic Expand AI Partnership with $100B Investment in AWS Technologies

Amazon and Anthropic expand their partnership with a $100B investment in AWS, enhancing AI infrastructure and accelerating generative AI adoption globally.

Staff1 May, 2026

Anthropic Plans Japan Expansion for Claude Mythos AI Amid U.S. Opposition

Anthropic expands Claude Mythos AI into Japan amid U.S. government scrutiny over potential national security risks and AI misuse concerns.

Staff1 May, 2026

AIPRESSA.COM

Top Stories

Anthropic’s Claude Opus 4.5 Achieves Record 4-Hour, 49-Minute Task Success Rate

Trending

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

AI Research

Amazon Awards 63 Research Grants to 41 Universities Across 8 Countries for AI Innovation

You May Also Like

AI Cybersecurity

Anthropic’s Mythos Reveals Thousands of Vulnerabilities, Banks Prepare for AI Cyberattacks

AI Business

Jensen Huang Critiques AI Doom Predictions, Calls for Fact-Based Discussions

AI Government

Anthropic Accuses Moonshot AI of 3.4M Unauthorized Claude Exchanges Amid US State Response

AI Cybersecurity

Anthropic Launches Beta of Claude Security AI Tools to Combat Cyber Threats

Top Stories

DeepSeek Launches V4 Open-Source Model, Underpricing GPT-5.5 and Claude Opus 4.7

AI Cybersecurity

Anthropic Launches Claude Security for AI Vulnerability Scanning in Public Beta

AI Technology

Amazon and Anthropic Expand AI Partnership with $100B Investment in AWS Technologies

Top Stories

Anthropic Plans Japan Expansion for Claude Mythos AI Amid U.S. Opposition