Head-to-head comparison of Anthropic's two most powerful models across 17 benchmarks from the official system card. Mythos (Capybara tier) represents a generational leap over Opus 4.6 in coding, reasoning, math, and cybersecurity.
Claude Mythos Preview is restricted to Project Glasswing cybersecurity partners. Benchmarks are from Anthropic's system card and leaked documents. Opus 4.6 is available on the public API at $5/$25 per million tokens.
| Benchmark | Mythos | Opus 4.6 | Delta |
|---|---|---|---|
| SWE-bench Verified | 93.9% | 80.8% | +13.1 |
| SWE-bench Pro | 77.8% | 53.4% | +24.4 |
| SWE-bench Multilingual | 87.3% | 77.8% | +9.5 |
| SWE-bench Multimodal | 59% | 27.1% | +31.9 |
| Terminal-Bench 2.0 | 82% | 65.4% | +16.6 |
| Benchmark | Mythos | Opus 4.6 | Delta |
|---|---|---|---|
| GPQA Diamond | 94.6% | 91.3% | +3.3 |
| MMMLU | 92.7% | 91.1% | +1.6 |
| Humanity's Last Exam (no tools) | 56.8% | 40% | +16.8 |
| Humanity's Last Exam (with tools) | 64.7% | 53.1% | +11.6 |
| Benchmark | Mythos | Opus 4.6 | Delta |
|---|---|---|---|
| USAMO 2026 | 97.6% | 42.3% | +55.3 |
| Benchmark | Mythos | Opus 4.6 | Delta |
|---|---|---|---|
| GraphWalks BFS (256K-1M) | 80% | 38.7% | +41.3 |
| Benchmark | Mythos | Opus 4.6 | Delta |
|---|---|---|---|
| CharXiv Reasoning (no tools) | 86.1% | 61.5% | +24.6 |
| CharXiv Reasoning (with tools) | 93.2% | 78.9% | +14.3 |
| Benchmark | Mythos | Opus 4.6 | Delta |
|---|---|---|---|
| OSWorld | 79.6% | 72.7% | +6.9 |
| BrowseComp | 86.9% | 83.7% | +3.2 |
| Benchmark | Mythos | Opus 4.6 | Delta |
|---|---|---|---|
| CyberGym | 83.1% | 66.6% | +16.5 |
| Cybench (35 CTF challenges) | 100% | -- | -- |
Source: Anthropic system card (red.anthropic.com) and leaked documents. Highlighted rows show 15+ point improvements. Not independently verified.
| Mythos | Opus 4.6 | |
|---|---|---|
| Input $/1M tokens | $25.00 * | $5.00 |
| Output $/1M tokens | $125.00 * | $25.00 |
| Model Tier | Capybara | Opus |
| Public API | Restricted | Available |
| OpenRouter | Not yet | Available |
* Glasswing partner pricing. Public API pricing not yet announced.
Latency and throughput data for Mythos have not been published. The system card focuses on capability benchmarks rather than speed. Since Mythos is described as "very expensive to serve," it may trade speed for capability. Opus 4.6 currently delivers around 60 tokens per second on most providers.
No. Mythos sits in a new "Capybara" tier above Opus in Anthropic's model hierarchy. Opus 4.6 remains the top publicly available Claude model. Mythos is currently restricted to Project Glasswing cybersecurity partners. Anthropic has not announced plans to replace Opus with Mythos.
Not yet. Mythos Preview is only available to vetted Project Glasswing partners for defensive cybersecurity research. While benchmarks show it excels at general coding and reasoning too, public API access has not been announced. For production use, Claude Opus 4.6 remains the best available Anthropic model.