Claude Mythos vs Gemini 3.1 Pro

Anthropic's Capybara-tier model versus Google DeepMind's flagship Gemini 3.1 Pro. Mythos leads on 7 of 8 benchmarks where both have scores, but Gemini fights back on MMMLU and stays close on GPQA Diamond.

Important context

Mythos benchmarks come from Anthropic's system card. Gemini 3.1 Pro scores are from Anthropic's comparative evaluations, not Google's. Gemini 3.1 Pro is publicly available; Mythos is restricted to Glasswing partners only.

Mythos Wins

Gemini 3.1 Pro Wins

Full Benchmark Comparison

Benchmark	Mythos	Gemini 3.1	Winner
SWE-bench Verified	93.9%	80.6%	Mythos +13.3
SWE-bench Pro	77.8%	54.2%	Mythos +23.6
Terminal-Bench 2.0	82%	68.5%	Mythos +13.5
GPQA Diamond	94.6%	94.3%	Mythos +0.3
MMMLU	92.7%	93.6%	Gemini +0.9
Humanity's Last Exam (no tools)	56.8%	44.4%	Mythos +12.4
Humanity's Last Exam (with tools)	64.7%	51.4%	Mythos +13.3
USAMO 2026	97.6%	74.4%	Mythos +23.2

Mythos scores from Anthropic system card. Gemini 3.1 Pro scores from Anthropic's comparative evaluations. Google's own reported scores may differ.

Key Observations

Coding dominance: Mythos leads by 23.6 points on SWE-bench Pro (77.8% vs 54.2%) and 13.3 points on SWE-bench Verified. This is the widest gap between the two models.

Math breakthrough: On USAMO 2026, Mythos scores 97.6% versus Gemini's 74.4% - a 23.2-point gap. This suggests a significant improvement in mathematical reasoning.

Gemini's strength: Gemini 3.1 Pro wins on MMMLU (93.6% vs 92.7%) and is within 0.3 points on GPQA Diamond (94.3% vs 94.6%). On broad knowledge benchmarks, the two models are nearly matched.

Practical reality: Gemini 3.1 Pro is publicly available with generous free tiers. Mythos is restricted to 40+ Glasswing partners. For most users, the Gemini vs Opus 4.6 comparison is more actionable.

Pricing & Availability

	Mythos	Gemini 3.1 Pro
Input $/1M tokens	$25.00 *	~$1.25
Output $/1M tokens	$125.00 *	~$10.00
Free tier	No	Yes (rate limited)
Public API	Restricted	Available
Provider	Anthropic	Google DeepMind

* Glasswing partner pricing. Mythos is 12-20x more expensive than Gemini 3.1 Pro.

Related Comparisons

Mythos vs Opus Mythos vs GPT-5 Claude vs Gemini (All) Mythos Tracker Full Report

Frequently Asked Questions

Yes. Gemini 3.1 Pro scores 93.6% on MMMLU compared to Mythos's 92.7%, making it the only benchmark where Gemini leads in this comparison. The margin is small (0.9 points), and Gemini also comes very close on GPQA Diamond (94.3% vs 94.6%). On coding and math benchmarks, Mythos leads by much larger margins.

Based on the system card benchmarks, Mythos leads decisively in coding tasks: SWE-bench Pro 77.8% vs 54.2% (+23.6 points), SWE-bench Verified 93.9% vs 80.6%, and Terminal-Bench 82.0% vs 68.5%. However, Mythos is not publicly available, so Gemini 3.1 Pro remains one of the best coding models you can actually access today.

Anthropic has not announced a public release date for Claude Mythos. It is currently restricted to Project Glasswing cybersecurity partners. For users choosing between publicly available models, the comparison between Claude Opus 4.6 and Gemini 3.1 Pro is more directly relevant.