Anthropic's Capybara-tier model versus Google DeepMind's flagship Gemini 3.1 Pro. Mythos leads on 7 of 8 benchmarks where both have scores, but Gemini fights back on MMMLU and stays close on GPQA Diamond.
Mythos benchmarks come from Anthropic's system card. Gemini 3.1 Pro scores are from Anthropic's comparative evaluations, not Google's. Gemini 3.1 Pro is publicly available; Mythos is restricted to Glasswing partners only.
| Benchmark | Mythos | Gemini 3.1 | Winner |
|---|---|---|---|
| SWE-bench Verified | 93.9% | 80.6% | Mythos +13.3 |
| SWE-bench Pro | 77.8% | 54.2% | Mythos +23.6 |
| Terminal-Bench 2.0 | 82% | 68.5% | Mythos +13.5 |
| GPQA Diamond | 94.6% | 94.3% | Mythos +0.3 |
| MMMLU | 92.7% | 93.6% | Gemini +0.9 |
| Humanity's Last Exam (no tools) | 56.8% | 44.4% | Mythos +12.4 |
| Humanity's Last Exam (with tools) | 64.7% | 51.4% | Mythos +13.3 |
| USAMO 2026 | 97.6% | 74.4% | Mythos +23.2 |
Mythos scores from Anthropic system card. Gemini 3.1 Pro scores from Anthropic's comparative evaluations. Google's own reported scores may differ.
| Mythos | Gemini 3.1 Pro | |
|---|---|---|
| Input $/1M tokens | $25.00 * | ~$1.25 |
| Output $/1M tokens | $125.00 * | ~$10.00 |
| Free tier | No | Yes (rate limited) |
| Public API | Restricted | Available |
| Provider | Anthropic | Google DeepMind |
* Glasswing partner pricing. Mythos is 12-20x more expensive than Gemini 3.1 Pro.
Yes. Gemini 3.1 Pro scores 93.6% on MMMLU compared to Mythos's 92.7%, making it the only benchmark where Gemini leads in this comparison. The margin is small (0.9 points), and Gemini also comes very close on GPQA Diamond (94.3% vs 94.6%). On coding and math benchmarks, Mythos leads by much larger margins.
Based on the system card benchmarks, Mythos leads decisively in coding tasks: SWE-bench Pro 77.8% vs 54.2% (+23.6 points), SWE-bench Verified 93.9% vs 80.6%, and Terminal-Bench 82.0% vs 68.5%. However, Mythos is not publicly available, so Gemini 3.1 Pro remains one of the best coding models you can actually access today.
Anthropic has not announced a public release date for Claude Mythos. It is currently restricted to Project Glasswing cybersecurity partners. For users choosing between publicly available models, the comparison between Claude Opus 4.6 and Gemini 3.1 Pro is more directly relevant.