UK AI Security Institute Finds GPT-5.5 Rivals Claude Mythos in Cyber Attack Evaluations

0
3

Key Takeaways

  • The UK AI Security Institute (AISI) found that OpenAI’s GPT‑5.5 performs on par with Anthropic’s Claude Mythos Preview in rigorous cyber‑attack evaluations.
  • On expert‑level capture‑the‑flag tasks, GPT‑5.5 achieved a 71.4 % success rate, slightly ahead of Claude Mythos Preview’s 68.6 %.
  • Both models solved the multi‑stage enterprise‑network simulation “The Last Ones” in a minority of attempts, showing that advanced AI can chain together complex hacking steps when given sufficient inference compute.
  • Neither model has yet conquered the industrial‑control‑system scenario “Cooling Tower,” indicating that certain domains remain out of reach for current frontier LLMs.
  • A universal jailbreak was discovered that bypassed all of GPT‑5.5’s safety filters for malicious cyber requests, underscoring that even the most capable models retain significant safety weaknesses.
  • GPT‑5.5 is publicly available via ChatGPT and the API, whereas Claude Mythos Preview remains restricted, suggesting that deployment speed may be influenced by compute constraints as much as safety caution.

Overview of the AISI Evaluation Framework
The UK AI Security Institute conducted a comprehensive assessment of frontier language models using two complementary test beds. First, a suite of 95 capture‑the‑flag (CTF) challenges spanning four difficulty levels measured isolated skills such as reverse engineering, exploit development, cryptographic attacks, and malware unpacking. Second, the institute employed cyber‑range simulations that mimic realistic network environments, requiring agents to chain multiple actions together to achieve a final objective. This dual approach allowed AISI to gauge both granular technical proficiency and the ability to orchestrate multi‑step attacks, providing a nuanced picture of each model’s offensive cyber capabilities.

Performance on Expert‑Level CTF Tasks
At the highest “Expert” difficulty tier, GPT‑5.5 attained an average success rate of 71.4 percent, while Claude Mythos Preview scored 68.6 percent. The difference lies within the statistical margin of error, indicating that the two models are effectively tied at this level. For context, earlier iterations such as GPT‑5.4 and Claude Opus 4.7 achieved markedly lower scores of 52.4 percent and 48.6 percent, respectively. The results suggest that recent advances in model scale, reasoning, and coding ability have narrowed the gap between the leading contenders, pushing frontier performance toward the low‑70 percent range on expert cyber challenges.

Network‑Attack Simulation: “The Last Ones”
To move beyond isolated skills, AISI deployed the cyber range “The Last Ones” (TLO), a 32‑step scenario distributed across four subnets and roughly twenty hosts. Starting with no credentials, the AI agent must discover vulnerabilities, harvest credentials, move laterally, and ultimately exfiltrate data from a protected database. A human expert would typically require about twenty hours to complete the exercise. In the evaluation, GPT‑5.5 solved TLO in two out of ten runs, whereas Claude Mythos Preview succeeded three times out of ten. Importantly, AISI observed a clear scaling trend: increasing the token budget (i.e., allowing the model more “thinking” time) raised the probability of success, implying that current models have not yet plateaued and could improve further with additional inference compute.

Limitations of the Test Environment
The AISI tests deliberately omitted active defenders, security monitoring, and real‑world consequences for triggering alarms. Consequently, the success rates reflect performance against poorly protected or unmonitored networks rather than hardened enterprise defenses. While the results demonstrate that GPT‑5.5 and Claude Mythos Preview can execute sophisticated attack chains when opposition is weak, they do not guarantee efficacy against well‑defended systems equipped with intrusion detection, endpoint protection, or proactive threat‑hunting capabilities. The open question remains whether these models can sustain their advantage when faced with adaptive, real‑time defenses.

Industrial‑Control‑System Challenge: “Cooling Tower”
A second simulation, “Cooling Tower,” models an attack on an industrial control system (ICS) comprised of seven steps that bridge IT infrastructure and operational technology. Neither GPT‑5.5 nor Claude Mythos Preview has yet solved this scenario. According to AISI, failures occurred primarily during the upstream IT phases—such as gaining initial foothold or moving laterally—rather than at the final control‑system manipulation stage. This suggests that while the models exhibit strong capabilities in conventional IT exploitation, they still struggle with the specialized protocols, safety constraints, and real‑time constraints typical of ICS environments.

Safety Evaluation and the Discovery of a Universal Jailbreak
Beyond offensive prowess, AISI examined GPT‑5.5’s safeguards against malicious use. Researchers identified a universal jailbreak that successfully bypassed every safety filter OpenAI had deployed for harmful cyber requests, including multi‑step agent scenarios. Remarkably, the exploit was crafted in just six hours, highlighting the fragility of current alignment techniques. OpenAI subsequently released several updates to its safety system, but AISI could not verify the final configuration’s effectiveness due to a deployment‑side configuration issue. The episode reinforces the broader concern that jailbreaks remain a persistent vulnerability across even the most advanced LLMs, necessitating continual improvement in robust safety mechanisms.

Availability and Deployment Considerations
A notable distinction between the two models lies in their accessibility. GPT‑5.5 is openly available through ChatGPT and the public API, allowing a wide range of developers and researchers to experiment with its capabilities. In contrast, Anthropic has restricted Claude Mythos Preview to a limited group of partners, citing caution and safety concerns. The AISI findings hint that Anthropic’s restrained rollout may be less about ethical prudence and more about practical constraints—such as compute availability or the need to fine‑tune safety mitigations before broader release. Conversely, OpenAI’s decision to deploy GPT‑5.5 widely suggests confidence in its existing safety layers, albeit tempered by the demonstrated jailbreak weakness.

Implications for the AI‑Security Landscape
Overall, the AISI evaluation paints a picture of rapidly converging offensive cyber capabilities among frontier language models. GPT‑5.5’s slight edge on expert tasks and its ability to complete multi‑stage network attacks—when afforded sufficient inference time—signal that AI‑driven hacking tools are maturing quickly. However, the persistence of jailbreaks, the unsolved industrial‑control scenario, and the absence of active defenses in the testbed underscore that substantial hurdles remain before these models can pose a credible threat to well‑secured infrastructures. Stakeholders must therefore balance enthusiasm for AI’s productivity gains with vigilant investment in defensive AI, robust alignment research, and continuous red‑team testing to stay ahead of emerging threats.

SignUpSignUp form

LEAVE A REPLY

Please enter your comment!
Please enter your name here