AI Cybersecurity

OpenAI’s GPT-5.5 Matches Claude Mythos in Cyberattack Tests, Reveals AISI Findings

OpenAI’s GPT-5.5 outperformed Claude Mythos Preview in cyberattack simulations, achieving a 71.4% success rate in expert-level tasks, raising cybersecurity concerns.

Rachel Torres

Published

47 minutes ago

OpenAI’s GPT-5.5 has demonstrated capabilities on par with Anthropic’s Claude Mythos Preview in recent cyber evaluations conducted by the UK AI Security Institute (AISI), highlighting a significant trend in AI-driven attack capabilities. In a series of cyberattack simulations, GPT-5.5 became the second AI model, following Claude Mythos Preview, to successfully complete a multi-stage enterprise attack simulation. Notably, in isolated expert-level security tasks, GPT-5.5 outperformed its counterpart.

The AISI evaluated GPT-5.5 using a comprehensive suite of 95 capture-the-flag tasks spanning four difficulty levels. These advanced tasks, developed in partnership with cybersecurity firms Crystal Peak Security and Irregular, encompassed areas such as reverse engineering, exploit development, and cryptographic attacks. At the highest “Expert” difficulty, GPT-5.5 achieved an average success rate of 71.4 percent, while Claude Mythos Preview followed closely at 68.6 percent. Although the gap falls within the statistical margin of error, the data suggests GPT-5.5 may be the most capable model tested to date, significantly ahead of GPT-5.4 and Claude Opus 4.7, which scored 52.4 percent and 48.6 percent, respectively.

For AISI, the performance of these models indicates that the previously observed capabilities of Claude Mythos in April are not isolated incidents but reflective of broader advancements in AI autonomy, reasoning, and coding skills.

Beyond isolated tasks, AISI examined the ability of these models to navigate complex attack simulations. The simulation titled “The Last Ones” involves 32 steps across various network segments and approximately 20 hosts. In this scenario, the AI agent initiates the process without any credentials, seeking vulnerabilities, stealing credentials, and moving laterally through the network to access a secured database—a task estimated to require about 20 hours for a human expert. GPT-5.5 successfully completed this full attack simulation in 2 out of 10 attempts, while Claude Mythos Preview achieved a success rate of 3 out of 10. AISI noted that as inference compute scales, so does the performance, suggesting that additional processing tokens improve the likelihood of successful outcomes.

However, the AISI tests were conducted without active defenders or security monitoring, raising questions about how these models would fare in real-world scenarios with well-defended systems. While the current evaluations indicate strong capabilities against inadequately protected networks, they do not confirm effectiveness against robust security measures.

In another simulation, “Cooling Tower,” designed to model an attack on an industrial control system, both GPT-5.5 and Claude Mythos Preview struggled. This 7-step scenario remains unsolved by any model, with both failing primarily on upstream IT steps rather than the control system itself.

AISI also evaluated GPT-5.5’s safety protocols for public deployment, revealing a universal jailbreak that circumvented all safeguards OpenAI had implemented against malicious cyber requests, including multi-step scenarios. The jailbreak was developed in just six hours. Although OpenAI has since made updates to its safety measures, the AISI could not verify the effectiveness of the updated configuration due to a configuration issue in the deployed version. This incident underscores the ongoing vulnerability of large language models (LLMs) to security breaches.

One notable contrast between GPT-5.5 and Claude Mythos Preview is availability; GPT-5.5 is accessible through ChatGPT and its API, while Anthropic has restricted Claude Mythos to a limited user group. AISI’s findings suggest that Anthropic’s cautious rollout may not be solely driven by safety concerns but could also relate to computational limitations.

The evaluation of GPT-5.5 and Claude Mythos Preview emphasizes the burgeoning potential of AI in cyberattack scenarios, raising critical questions about security in digital environments. As these models continue to evolve, the implications for cybersecurity practices and defenses will be profound, necessitating ongoing scrutiny as AI technology advances.

Anthropic Launches BioMysteryBench to Evaluate AI in Complex Bioinformatics Tasks

Anthropic unveils BioMysteryBench, a benchmark that reveals Claude's 30% success on human-unsolvable bioinformatics questions, advancing AI's role in complex research tasks

Staff10 hours ago

AI Research

OpenAI Reveals Explainable AI for Detecting Machine-Generated Music in New Study

OpenAI introduces explainable AI techniques to detect machine-generated music, enhancing authenticity measures amid rising public concerns over AI's creative impact.

Staff11 hours ago

AI Generative

OpenAI’s GPT Image2 Generates MS Paint-Style Drawings, Sparking Viral Trend

OpenAI's GPT Image2 spurs a viral trend as users prompt AI to create MS Paint-style drawings from photos, challenging artistic norms and expectations.

Staff14 hours ago

AI Regulation

US Designates Anthropic as Supply Chain Risk Amid NSA’s Use of Mythos AI Model

US designates Anthropic as a supply chain risk, prohibiting federal use of its AI, while the NSA actively employs its Mythos model for cybersecurity.

Staff15 hours ago

AI Generative

Pinterest Reduces AI Budget While Adopting Hybrid Model with OpenAI and Alibaba

Pinterest slashes its AI budget by 90% while adopting a hybrid model with OpenAI and Alibaba, enhancing user experience and cost efficiency.

Staff19 hours ago

AI Cybersecurity

Anthropic’s Claude Model Discovers 1,000+ Vulnerabilities in Major Software Systems

Anthropic's Claude model has identified over 1,000 zero-day vulnerabilities in major software systems, revolutionizing cybersecurity and defense strategies.

Rachel Torres21 hours ago

AI Cybersecurity

Asian Banks Tighten Cybersecurity as Anthropic’s Mythos Reveals Major Vulnerabilities

Asian banks heighten cybersecurity measures as Anthropic’s Mythos tool uncovers thousands of vulnerabilities, prompting major institutions to reassess AI risks.

Rachel Torres1 day ago

Anthropic Shares Surge as Investors Offer Millions to Acquire Stakes Amid Gold Rush

Anthropic shares soar amid a frenzy of offers exceeding $1 trillion, as investors compete aggressively for stakes in the AI powerhouse.

Staff1 day ago

AIPRESSA.COM

AI Cybersecurity

OpenAI’s GPT-5.5 Matches Claude Mythos in Cyberattack Tests, Reveals AISI Findings

Trending

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

AI Research

Amazon Awards 63 Research Grants to 41 Universities Across 8 Countries for AI Innovation

You May Also Like

Top Stories

Anthropic Launches BioMysteryBench to Evaluate AI in Complex Bioinformatics Tasks

AI Research

OpenAI Reveals Explainable AI for Detecting Machine-Generated Music in New Study

AI Generative

OpenAI’s GPT Image2 Generates MS Paint-Style Drawings, Sparking Viral Trend

AI Regulation

US Designates Anthropic as Supply Chain Risk Amid NSA’s Use of Mythos AI Model

AI Generative

Pinterest Reduces AI Budget While Adopting Hybrid Model with OpenAI and Alibaba

AI Cybersecurity

Anthropic’s Claude Model Discovers 1,000+ Vulnerabilities in Major Software Systems

AI Cybersecurity

Asian Banks Tighten Cybersecurity as Anthropic’s Mythos Reveals Major Vulnerabilities

Top Stories

Anthropic Shares Surge as Investors Offer Millions to Acquire Stakes Amid Gold Rush