Key Takeaways
- Anthropic’s Project Glasswing – a highly capable AI model for discovering software vulnerabilities – has been withheld from public release to allow major tech firms to patch findings first.
- The underlying Mythos model uncovered decades‑old bugs, chained multiple flaws into working exploit chains, and achieved a 72.4 % success rate in autonomous exploit development.
- Despite the flood of discoveries, fewer than 1 % of the vulnerabilities identified by Mythos are patched, exposing a stark mismatch between discovery speed and remediation capacity.
- Defenders still operate on “calendar‑speed” processes (intelligence gathering, campaign building, periodic testing), while attackers leveraging LLMs now move at “machine‑speed,” weaponizing exploits within hours of disclosure.
- Real‑world autonomous attacks—such as an LLM‑driven MCP server targeting FortiGate appliances—have already compromised thousands of organizations with minimal human involvement.
- Traditional vulnerability management, which relies on CVSS scores and manual handoffs, collapses under the volume of AI‑generated findings; context‑aware, continuous validation is essential.
- A Mythos‑ready security program must prioritize signal‑driven validation, environment‑specific risk context, and closed‑loop remediation without manual ticket shuffling.
- Platforms like Picus Swarm compress the traditional four‑day validation cycle into ~3 minutes by coordinating AI agents that ingest threat intel, build safe attack playbooks, execute simulations, and trigger remediation in a closed loop.
- In the post‑Glasswing era, the decisive metric is not how many vulnerabilities are found but how many are validated and patched before adversaries can exploit them—making continuous exposure validation the linchpin of effective defense.
Project Glasswing: Holding Back a Powerful Vulnerability‑Discovery AI
Last week Anthropic unveiled Project Glasswing, an AI model so adept at uncovering software flaws that the company postponed its public release. Instead, access was granted to Apple, Microsoft, Google, Amazon, and a coalition of other firms so they could identify and patch bugs before malicious actors could weaponize them. The precaution mirrors the cautious rollout strategies seen with earlier large‑language models, but the stakes here are markedly higher because the model’s output directly translates into exploitable code.
Mythos: The Predecessor That Set the Stage
Project Glasswing grew from Mythos Preview, the model that first demonstrated the scale of AI‑driven vulnerability discovery. Mythos revealed flaws in every major operating system and browser, including bugs that had survived decades of human audits, aggressive fuzzing, and open‑source scrutiny. One vulnerability lingered for 27 years in OpenBSD, an OS widely regarded as among the most secure. Beyond isolated CVEs, Mythos showed the ability to chain four independent weaknesses into a full exploit chain that bypassed both browser rendering sandboxing and OS‑level protections, achieved Linux privilege escalation via race conditions, and constructed a 20‑gadget ROP pipeline targeting FreeBSD’s NFS server spread across network packets. Claude Opus 4.6, Anthropic’s previous frontier model, struggled with autonomous exploit creation, whereas Mythos achieved a 72.4 % success rate in the Firefox JavaScript shell—proof that the technology is already operational, not speculative.
The Alarming Patch‑Rate Gap
The most sobering statistic emerging from Glasswing’s tests is that fewer than 1 % of the vulnerabilities Mythos uncovered have been patched. The model produced the world’s most potent vulnerability‑discovery engine, yet the defensive ecosystem could not absorb its output. Glasswing solved the “finding” problem; the industry has yet to solve the “fixing” problem. This mismatch is not a temporary hiccup but a structural deficiency that will widen as AI‑driven discovery scales.
Why Defenders Lag: Calendar Speed Versus Machine Speed
The root cause lies in the mismatch between defender timelines and attacker timelines. Security teams traditionally follow a cadence: gather intelligence, build a campaign, simulate threats, mitigate, then repeat—a cycle that often spans four days even under optimal conditions. Adversaries, particularly those now embedding LLMs at every stage of their attack chain, operate at machine speed, compressing reconnaissance, exploitation, and exfiltration into minutes or hours. As a result, the periodic, human‑initiated processes that once sufficed are now obsolete in the face of relentless, AI‑accelerated threats.
AI‑Powered Attacks Are Already Autonomous
A concrete illustration appeared earlier this year when a threat actor deployed a custom MCP server hosting an LLM as part of an attack on FortiGate appliances. The AI handled the entire operation: generated backdoors, mapped internal infrastructure, performed autonomous vulnerability assessments, and prioritized offensive tools to gain domain‑admin access. The outcome was the compromise of 2,516 organizations across 106 countries, with the full kill chain—from initial access through credential dumping to data exfiltration—running autonomously. Human involvement was limited to post‑mortem review, underscoring how quickly AI can turn a vulnerability discovery into a large‑scale breach without manual intervention.
Discovery Outpacing Remediation: The Data Trend
The gap between discovery and remediation is not new, but AI has turned a modest disparity into a chasm. Autonomous systems like AISLE recently uncovered 13 of 14 OpenSSL CVEs in coordinated releases—bugs that had endured years of human review. XBOW rose to the top of HackerOne’s rankings in 2025, surpassing all human contributors. The median time from public disclosure to a weaponized exploit plummeted from 771 days in 2018 to single‑digit hours by 2024, and projections indicate that by 2025 most exploits will be weaponized before they are ever announced. Adding Mythos‑class discovery to this landscape means organizations will face a tsunami of legitimate findings that still require verification, prioritization, business‑continuity review, and patch cycles that have remained essentially unchanged for a decade.
Building a Mythos‑Ready Security Program
The instinctive reaction to Glasswing—“let’s find more bugs”—misdiagnoses the problem. The real question is: when thousands of exploitable vulnerabilities land on your desk tomorrow morning, can your program actually process them? For most organizations the answer is no, not because of a lack of tools or talent, but because existing defenses rely on periodic, human‑driven workflows designed for a trickle of flaws, not a flood.
A Mythos‑ready program must pivot on three pillars:
-
Signal‑Driven Validation Over Scheduled Testing – Defenses should be tested the instant a threat emerges, an asset changes, or configuration drifts, rather than waiting for the next quarterly pentest. The assumption of a stable threat landscape underlying scheduled validation is no longer viable.
-
Environment‑Specific Context Over Generic CVSS Scores – Glasswing will generate an avalanche of CVEs, yet prioritizing them by CVSS alone tells you only how bad a bug could be in theory, not whether it is exploitable in your particular stack given your controls and business risk. When findings scale from hundreds to thousands, context‑free prioritization collapses under its own weight.
- Closed‑Loop Remediation Without Manual Handoffs – The current model—scanner finds a bug → analyst triages → ticket moves to another team → patch applied weeks later → no re‑validation—introduces multiple points of failure. If the cycle from discovery to fix to re‑validation cannot run autonomously at machine speed, it will inevitably break under the volume of AI‑generated findings.
By leveraging the defender’s asymmetric advantage—deep knowledge of one’s own topology—organizations can close the gap, but only if they can act on that knowledge instantly.
Autonomous Exposure Validation: How Picus Swarm Closes the Loop
At Picus Security we have built a platform for Autonomous Exposure Validation, and the principles above directly informed our design. Picus Swarm consists of cooperating AI agents that compress the traditional four‑day validation cycle into roughly three minutes.
- Researcher Agent – ingests and vets fresh threat intelligence (e.g., CISA alerts, threat feeds).
- Red‑Teamer Agent – maps that intelligence against the customer’s environment to produce a safety‑checked attacker playbook tailored to the organization’s specific assets and defenses.
- Simulator Agent – executes the playbook on real endpoints and cloud workloads, gathering telemetry, proof‑of‑exploit data, and impact metrics.
- Coordinator Agent – translates findings into action: opens tickets, triggers SOAR playbooks, pushes IOCs to EDR/XDR systems, and automatically re‑validates after a patch is applied.
Every step is traceable, auditable, and operates within guardrails defined by the security team. When a Mythos‑class model drops thousands of findings on an organization, Swarm can instantly tell which of those are actually exploitable in the given environment, which existing controls would hold or fail, and what vendor‑specific remediation is required. The result is a continuous, machine‑speed feedback loop that turns raw discovery into proven risk reduction.
The Uncomfortable Truth: Validation Is the Decisive Metric
Project Glasswing will ultimately be judged not by how many vulnerabilities it finds, nor by the elegance of the exploit chains it reveals, but by how many of those issues are patched before attackers can exploit them. Visibility alone has never been enough—studies show that 83 % of cybersecurity programs still fail to demonstrate measurable outcomes. What changes the equation is closing the gap between seeing a flaw and proving whether it would actually compromise your specific environment. That proof is validation. In a post‑Glasswing world, continuous, automated exposure validation stands as the sole barrier between a deluge of AI‑generated discoveries and a corresponding deluge of breaches.
We invite security leaders to explore these concepts further at the Autonomous Validation Summit on May 12 & 14, co‑hosted with Frost & Sullivan and featuring practitioners from Kraft Heinz, Glow Financial Services, and Picus CTO Volkan Erturk. Register to learn how to turn AI‑driven vulnerability discovery from a liability into a controllable, actionable risk‑management process.
Note: This summary was authored by Sıla Özeren Hacıoğlu, Security Research Engineer at Picus Security.

