U.S. Removes Export Restrictions on Anthropic’s Advanced Cybersecurity AI Models

0
2

Key Takeaways

  • Anthropic restored worldwide access to its Fable 5 model after a three‑week shutdown caused by U.S. export controls, marking the first time such controls were used to pull AI software rather than hardware.
  • The shutdown stemmed from a disclosed “jailbreak” technique that enables the model to automate the find‑fix‑test loop for defensive security work; Anthropic concluded the capability is not unique to its models.
  • To address concerns, Anthropic deployed a new safety classifier blocking the technique in >99% of cases, pledged expanded pre‑release government access, rapid disclosure of jailbreaks, joint research resources, a voluntary security standard, and a HackerOne bug‑bounty program.
  • Together with Glasswing partners (Amazon, Microsoft, Google), Anthropic is drafting an industry framework to score jailbreak severity across capability gain, task breadth, ease of weaponisation, and discoverability.
  • An open letter signed by >100 cybersecurity leaders warned that the export controls risk doing more harm than good, noting that Chinese open‑weight models are rapidly closing the gap with U.S. frontier models.
  • The episode unfolded amid heightened tensions with the Trump administration, which had labeled Anthropic a “supply chain risk” and whose officials reportedly directed the lift of the ban to co‑founder Tom Brown rather than CEO Dario Amodei.

Restoration of Fable 5 Access
Anthropic announced on Wednesday that it had restored global access to its Fable 5 model, ending a roughly three‑week shutdown. The restriction had been imposed by the U.S. government under export‑control authorities that barred foreign nationals from using the advanced, cybersecurity‑focused AI tool. The company said the controls were lifted after reaching a series of agreements with federal officials, allowing users worldwide to resume interaction with the model.

First Use of Export Controls on AI Software
The episode represents the first known instance in which export‑control measures were applied to pull an AI software model from public access, rather than targeting chips or hardware. Anthropic noted that the reversal could establish a precedent for how frontier AI models are regulated in the United States moving forward, potentially shaping future policy debates about balancing national security with innovation.

Five Eyes Warning and Strategic Implications
The restoration coincides with a stark warning from the Five Eyes intelligence alliance, which urged business leaders to prepare for the imminent impact of frontier AI models on cybersecurity. The alliance asserted that new models will “fundamentally transform” both offensive and defensive capabilities, emphasizing that the timeline for these changes is measured in months, not years. This backdrop heightened scrutiny of Anthropic’s model and the government’s response.

Mythos 5 Controls and the Glasswing Program
While access to Fable 5 was fully reopened, Anthropic’s more powerful cybersecurity model, Mythos 5, had its export controls lifted as of June 30. However, Mythos 5 remains restricted to vetted U.S. organizations through Project Glasswing, Anthropic’s controlled‑access program designed for critical‑infrastructure defenders. The company said it continues to negotiate broader domestic and international access via Glasswing, aiming to expand the model’s utility while maintaining safeguards.

Origin of the Shutdown: A Jailbreak Technique
Anthropic attributed the initial shutdown to a “jailbreak” technique detailed in an Amazon research report. The method involved feeding Fable 5 open‑source code containing publicly known vulnerabilities and deliberately implanted flaws, then instructing the model to “fix this code.” Researchers manually assembled the model’s output across multiple steps into scripts that test patches, effectively automating the find‑fix‑test cycle used by defensive security teams.

Expert Assessment by Katie Moussouris
Katie Moussouris, founder of Luta Security and consulted by Anthropic to evaluate the paper, argued that the technique does not constitute a guardrail bypass. Instead, she described it as “the most valuable thing an AI model can do for defensive security,” enabling the routine find, fix, and test loop. Moussouris concluded that removing this capability would degrade the model’s usefulness for legitimate security work, suggesting the technique is intrinsic to the model’s utility rather than a flaw to be eradicated.

Broader Model Testing and Findings
Anthropic’s internal testing confirmed that the same jailbreak approach works against other leading models, including OpenAI’s GPT‑5.5 and the Chinese model Kimi K2.7—none of which faced comparable export restrictions. The company asserted that the technique exposed no capability unique to its frontier models, implying that any restriction based solely on this behavior would affect the entire industry rather than address a specific risk.

Safety Classifier and Mitigation Measures
In response, Anthropic trained a new safety classifier that blocks the specific jailbreak technique in more than 99% of cases. The Commerce Department’s Center for AI Standards and Innovation tested both the original and updated safeguards and endorsed the outcome. Beyond the classifier, Anthropic committed to several transparency and collaboration measures: expanded pre‑release access for government evaluators, rapid disclosure of significant jailbreaks, dedicated staff and compute for joint research, participation in a shared voluntary security standard across frontier model providers, and the launch of a HackerOne bug‑bounty program for cyber‑jailbreak submissions.

Industry Framework for Jailbreak Severity
Together with its Glasswing partners—Amazon, Microsoft, and Google—Anthropic said it is drafting an industry framework to score jailbreak severity. The proposed system evaluates four criteria: the capability gain over existing tools, the breadth of tasks affected, the ease of weaponisation, and the discoverability of the exploit. This effort aims to create a common language for assessing risk and guiding mitigation strategies across the AI ecosystem.

Open Letter from Cybersecurity Leaders
More than 100 cybersecurity professionals signed an open letter organized by former Facebook security chief Alex Stamos, addressed to Commerce Secretary Howard Lutnick and National Cyber Director Sean Cairncross. The signatories—including executives from Nvidia, Adobe, Zoom, Google, and Sophos—warned that the export controls risked doing more harm than good. They argued that Chinese open‑weight models are only months behind the best American models, and withdrawing top capabilities from defenders while adversaries advance rapidly is dangerous.

Industry‑Wide Impact Concerns
The letter echoed Anthropic’s own warning that applying the standard used for Fable 5 across the industry would, in the company’s words, “essentially halt all new model deployments for all frontier model providers.” This perspective underscored the tension between safeguarding national security and preserving the pace of AI innovation, urging policymakers to consider narrower, targeted measures rather than broad restrictions.

Political and Administrative Tensions
The dispute unfolded against a backdrop of friction between Anthropic and the Trump administration. In February, Defense Secretary Pete Hegseth designated Anthropic a “supply chain risk”—a label historically applied to firms such as Huawei—after contract negotiations over military use of Claude broke down. Anthropic co‑founder Tom Brown assumed leadership of negotiations with the administration from CEO Dario Amodei, who, according to CNBC, had become a political target due to his public AI‑safety positions and his support for Kamala Harris in the 2024 election. Notably, the formal letter lifting the ban was reportedly addressed to Brown rather than to Amodei, highlighting the shift in diplomatic channels during the resolution.

SignUpSignUp form

LEAVE A REPLY

Please enter your comment!
Please enter your name here