On March 26, 2026, security researchers Roy Paz (LayerX Security) and Alexandre Pauwels (University of Cambridge) discovered approximately 3,000 unpublished assets in an unsecured, publicly accessible data cache belonging to Anthropic. Among those assets: draft blog posts and internal documents describing Claude Mythos - a new AI model that Anthropic calls "the most capable we've built to date" and "a step change" in AI performance.
On April 7, 2026, Anthropic officially acknowledged Mythos by launching Project Glasswing - a cybersecurity consortium giving Mythos Preview access to Amazon, Apple, Microsoft, Google, NVIDIA, and 40+ other organizations for defensive security work. This is not a typical model launch. Anthropic is so concerned about Mythos's capabilities that it is withholding public release until safeguards are in place.
Below, we break down every confirmed detail: the benchmarks, the cybersecurity capabilities, the consortium, and what this means for the AI model leaderboard.
How the Leak Happened
The leak was the result of a configuration error in Anthropic's content management system. An unsecured data cache containing draft blog posts, internal documents, and other unpublished content was left publicly accessible on the internet.
Security researchers Paz and Pauwels found approximately 3,000 unpublished assets. The materials described a new model representing "a step change" in capability, operating in a new tier above the current Opus models - internally designated "Capybara".
After Fortune contacted Anthropic on Thursday evening, March 26, the data cache was removed from public access. Anthropic acknowledged the leak, describing the disclosed materials as "early drafts of content considered for publication" resulting from configuration errors. The company confirmed the model's existence and that it was being trialed by "early access customers."
Benchmark Performance: The Numbers
The leaked documents and subsequent reporting reveal benchmark scores that, if they hold at public release, would place Claude Mythos significantly ahead of every publicly available model. The improvements over Claude Opus 4.6 are not incremental - they are generational.
The standout result is SWE-bench Pro - the hardest tier of the SWE-bench evaluation suite, designed to test real-world software engineering ability. A jump from 53.4% to 77.8% (+24.4 points) is one of the largest generational improvements on this benchmark.
SWE-bench Multimodal shows the biggest relative gain: Mythos more than doubles Opus 4.6's score (59.0% vs 27.1%), suggesting a major leap in the model's ability to reason about visual context alongside code - diagrams, screenshots, UI mockups.
On BrowseComp (multi-step web research), Mythos achieves 86.9% while using 4.9x fewer tokens than Opus 4.6 to reach its score - indicating not just better results, but dramatically more efficient reasoning.
The Capybara Tier: A New Model Class
Leaked documents describe Mythos as belonging to a new tier called "Capybara", sitting above the current Opus tier in Anthropic's model hierarchy. This is not simply "Opus 5"or "Claude 5" - though the community has used those labels. The internal documents suggest Capybara is a distinct capability class.
What we do not know (as of April 7, 2026):
Anthropic described the model as "very expensive for us to serve, and will be very expensive for our customers to use." This confirms Capybara will sit at a premium price point. For current Anthropic pricing, see our Anthropic API pricing page.
Cybersecurity: Why Anthropic Is Withholding Public Release
The most consequential aspect of Claude Mythos is not its benchmark scores - it is what the model can do with code in adversarial contexts. Anthropic CEO Dario Amodei stated in a video released alongside the Project Glasswing announcement:
"Claude Mythos Preview is a particularly big jump. We haven't trained it specifically to be good at cyber. We trained it to be good at code, but as a side effect of being good at code, it's also good at cyber."- Dario Amodei, CEO of Anthropic
Anthropic claims that over the past few weeks of internal testing, Mythos Preview has identified thousands of zero-day vulnerabilities, many of them critical - including bugs that are one to two decades old in heavily scrutinized codebases. Unlike previous models that could find vulnerabilities, Mythos can also write the exploits to accompany them.
Logan Graham, Anthropic's frontier red team lead, described the model's capabilities: "We've seen Mythos Preview accomplish things that a senior security researcher would be able to accomplish."
This is why Anthropic is withholding public release. The same capabilities that make Mythos exceptionally useful for defensive security also make it a potent offensive tool. Amodei added: "More powerful models are going to come from us and from others, and so we do need a plan to respond to this."
Project Glasswing: The Defensive Consortium
Rather than release Mythos publicly, Anthropic launched Project Glasswing on April 7, 2026 - a consortium of technology, cybersecurity, critical infrastructure, and financial organizations that will use Mythos Preview exclusively for defensive security work.
Anthropic's financial commitments to the program:
The program operates on a coordinated vulnerability disclosure model: Glasswing partners use Mythos Preview to scan both first-party and open-source software systems, and developers are given time to patch discovered vulnerabilities before any public disclosure. This mirrors the responsible disclosure framework that has governed the security research community for decades.
Anthropic has been privately warning top government officials that Mythos makes large-scale cyberattacks significantly more likely as similar models proliferate. The Glasswing program is Anthropic's attempt to give defenders a head start.
Anthropic on the Leaderboard Today
Mythos is not yet available through public APIs or OpenRouter, so it does not appear in our live rankings. Here is where Anthropic's current models stand:
If Mythos's leaked benchmarks hold, it would likely claim the #1 position on our coding leaderboard by a significant margin. The current Opus 4.6 sits at rank #6 with a composite score of 90/100. We will add Mythos to the leaderboard the moment it becomes available through a public API. See the full provider breakdown at Anthropic provider page.
Frontier Model Landscape
To put the Mythos benchmarks in context, here are the current top models in our coding rankings. The SWE-bench Pro scores reported for Mythos (77.8%) would surpass every model in this list:
The competitive landscape for frontier models has intensified in 2026. OpenAI released GPT-5.3-Codex in February. Google's Gemini models continue to improve. But Mythos's leaked numbers suggest Anthropic may have opened a meaningful capability gap - particularly in coding and agentic tasks. For full rankings across all providers, visit our best coding models leaderboard.
Timeline of Events
What Happens Next
Anthropic has not confirmed a public release date for Claude Mythos. Based on the available information, the staged rollout is expected to follow this sequence:
We will update our live rankings and this report the moment new information becomes available. Follow our AI news feed for real-time updates, or check the release timeline for the latest model launches.
Key Takeaways
Sources
This report is compiled from the following verified sources. All benchmark numbers and quotes are attributed to their original sources. We have not independently verified benchmark claims.
Related Reports
The competitive landscape Mythos is entering - who leads, what separates them.
Token pricing trends - and why Capybara tier pricing matters for the market.
Context window trends - a key unknown for Mythos that will affect its use cases.
Current pricing for all Claude models - the baseline that Capybara tier will exceed.