The Model Too Dangerous to Release: Anthropic's Mythos Preview Found Vulnerabilities in Everything

David Borish
2 days ago
6 min read

Anthropic model too dangerous to release

Anthropic's newest AI model found a 27-year-old bug in OpenBSD, an operating system whose entire identity rests on being secure. It found a 16-year-old vulnerability in FFmpeg's H.264 codec, in a line of code that automated testing tools had scanned five million times. It chained together multiple Linux kernel vulnerabilities to escalate from ordinary user access to full root control. And it did all of this autonomously, without human guidance after an initial prompt.

Claude Mythos Preview, announced on April 7, 2026, is Anthropic's most capable model. The company is not releasing it to the public. Instead, it has built a coalition of twelve major technology and security companies to use the model strictly for defensive purposes, an initiative called Project Glasswing. The roster includes AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. An additional 40 organizations that build or maintain critical software infrastructure will also receive access.

The decision to withhold a frontier model while simultaneously spending $100 million to let select partners use it reveals something about where AI capabilities now sit. This is a model whose offensive security skills emerged as a side effect of being good at coding and reasoning.

What Mythos Preview Actually Found

The OpenBSD vulnerability illustrates the kind of finding that goes well beyond benchmark scores. TCP's SACK (Selective Acknowledgement) protocol, introduced in 1998, tracks which packets have been received by maintaining a linked list of "holes" representing gaps. Mythos Preview identified a path where a specially crafted SACK block could trigger a signed integer overflow in sequence number comparisons, causing the kernel to simultaneously believe a value was both below and above critical thresholds. The result: a null pointer dereference that could remotely crash any machine running the operating system just by connecting to it. The bug had been present since OpenBSD added SACK support 27 years ago.

In FFmpeg, the model found that the H.264 codec used 16-bit integers to track slice ownership of macroblocks but initialized the table with a sentinel value of 65535. Under normal conditions, no video would use enough slices to collide with that sentinel. But an attacker-crafted frame with exactly 65,536 slices would cause a collision, leading to an out-of-bounds write. The underlying logic error dates to 2003; it became exploitable after a 2010 refactor.

The model also found and exploited a 17-year-old remote code execution vulnerability in FreeBSD's NFS server implementation (CVE-2026-4747). The exploit involved a stack buffer overflow in the RPCSEC_GSS authentication handler, where a 400-byte attacker-controlled payload could overflow a 128-byte buffer. Mythos Preview worked around the 200-byte effective overflow limit by splitting a 20-gadget ROP chain across six sequential RPC requests, ultimately writing an SSH public key to the root authorized_keys file. The entire discovery and exploitation happened without human intervention.

The Capability Jump

On CyberGym, a vulnerability reproduction benchmark, Mythos Preview scored 83.1% compared to Opus 4.6's 66.6%, but the raw benchmark numbers understate the qualitative difference between the two models. On a specific test involving Firefox 147 JavaScript engine vulnerabilities, Opus 4.6 developed working exploits only twice out of several hundred attempts. Mythos Preview produced 181 working exploits and achieved register control on 29 more from the same set.

On Anthropic's internal crash severity ladder, which grades the worst crash a model can produce in open-source repositories on a five-tier scale, Opus 4.6 and Sonnet 4.6 each achieved a single tier-3 crash across roughly 7,000 entry points. Mythos Preview achieved full control flow hijack, the highest tier, on ten separate fully-patched targets.

The broader coding benchmarks tell a similar story. On SWE-bench Verified, Mythos Preview scored 93.9% versus Opus 4.6's 80.8%. On SWE-bench Pro, it reached 77.8% compared to 53.4%. On Terminal-Bench 2.0, which measures complex agentic coding tasks, it hit 82.0% versus 65.4%.

These capabilities were not specifically trained. They emerged as downstream consequences of general improvements in code reasoning and autonomy. The same improvements that make the model better at patching vulnerabilities make it better at exploiting them.

The Exploit Sophistication Problem

Finding vulnerabilities is one thing. Writing working exploits for hardened systems is another. And this is where Mythos Preview has made the jump that prior models could not.

The Frontier Red Team blog documents exploit chains of remarkable complexity. In one case, the model wrote a web browser exploit that chained four vulnerabilities into a JIT heap spray, escaping both renderer and operating system sandboxes. In another, it turned a one-byte read primitive from a use-after-free bug in Linux's Unix domain sockets into full root access, navigating around HARDENED_USERCOPY protections by reading from kernel stacks and per-CPU memory regions that the hardening checks permit.

The Linux kernel exploit walkthrough published by the team reads like work from a top-tier offensive security research group. The model identified that a one-bit write primitive could flip the read/write permission bit on a page table entry, used interleaved memory allocations to ensure physical adjacency between a bitmap slab page and a page table page, and then leveraged the netlink NLM_F_EXCL flag as an oracle to probe which allocation had achieved the correct adjacency. The complete exploit chain, from a syzkaller-discovered bug to root, cost under $1,000 at API pricing.

This is the transition point Anthropic is worried about. Previously, the bottleneck in exploit development was human expertise and time. A skilled researcher might spend days or weeks turning a known vulnerability into a working exploit. Mythos Preview did it in hours, repeatedly, across different operating systems and software stacks.

The Defensive Bet

Anthropic is framing Project Glasswing as an attempt to give defenders a head start. The logic is straightforward: models with capabilities similar to Mythos Preview will eventually be available from multiple AI labs, and when they are, attackers will use them. The window between now and then is an opportunity to find and patch as many vulnerabilities as possible.

The partners are already reporting results. Cisco, CrowdStrike, Microsoft, and Palo Alto Networks have each been running the model against their own codebases for several weeks. Microsoft tested it against CTI-REALM, their open-source security benchmark, and reported substantial improvements over previous models. CrowdStrike's CTO Elia Zaitsev characterized the window between vulnerability discovery and exploitation as having "collapsed" from months to minutes with AI.

The $100 million commitment covers model usage credits for all participating organizations during the research preview period. After that, Mythos Preview will be available to participants at $25/$125 per million input/output tokens through the Claude API, Amazon Bedrock, Google Cloud's Vertex AI, and Microsoft Foundry. Anthropic has also donated $2.5 million to Alpha-Omega and OpenSSF through the Linux Foundation, and $1.5 million to the Apache Software Foundation.

Within 90 days, Anthropic plans to report publicly on findings, vulnerabilities fixed, and lessons learned. The company has also been briefing CISA and the Commerce Department on the model's capabilities.

The Broader Context

This announcement did not arrive cleanly. In late March, Fortune reported that a draft blog post about the model was discovered in an unsecured, publicly searchable data store due to a misconfigured content management system. The leak revealed internal descriptions of Mythos as posing "unprecedented cybersecurity risks" and being "far ahead of any other AI model in cyber capabilities." The irony of an AI company building the world's most advanced security research tool while leaving internal documents exposed through a toggle switch error was noted widely.

Anthropic is also navigating a legal battle with the Pentagon, which labeled the company a supply-chain risk over its refusal to allow autonomous targeting or surveillance of U.S. citizens. The company separately disclosed in November 2025 that a Chinese state-sponsored group had used Claude Code to infiltrate roughly 30 organizations before detection.

These are not peripheral details. They underscore the tension at the center of Anthropic's position: the company building models that could reshape offensive and defensive cybersecurity is simultaneously dealing with its own security lapses and geopolitical friction around military AI applications.

What This Means for the Security Industry

The Frontier Red Team's blog post closes with a series of recommendations that amount to a warning: the security equilibrium of the last twenty years is ending. Defense-in-depth measures that impose friction rather than hard barriers may become significantly weaker against model-assisted adversaries. Patch cycles need to shorten. Vulnerability disclosure processes need to account for the volume of bugs that language models can reveal. Technical incident response pipelines need automation.

The N-day problem is particularly acute. The team demonstrated that Mythos Preview can take a CVE identifier and a git commit hash and produce a working privilege escalation exploit, fully autonomously, in under a day. More than half of the 40 CVEs the team selected for this exercise were successfully exploited. Every day a known vulnerability goes unpatched is now a day where the cost of exploitation is measured in API credits, not researcher-hours.

Anthropic's Frontier Red Team put it plainly: after twenty years of relative stability in attack patterns, language models that can identify and exploit vulnerabilities at scale could upend the balance. Mythos Preview, they wrote, "is only the beginning."

DAVID BORISH

The Model Too Dangerous to Release: Anthropic's Mythos Preview Found Vulnerabilities in Everything

Anthropic model too dangerous to release

What Mythos Preview Actually Found

The Capability Jump

The Exploit Sophistication Problem

The Defensive Bet

The Broader Context

What This Means for the Security Industry

Comments

SIGN UP FOR MY NEWSLETTER

ARTIFICIAL INTELLIGENCE, BUSINESS, TECHNOLOGY, RECENT PRESS & EVENTS

Back to top