Claude Mythos vs GPT-5

Anthropic's restricted Capybara-tier model versus OpenAI's publicly available GPT-5.4. Benchmark data from Anthropic's 244-page system card shows Mythos leading on all evaluated tasks, but GPT-5.4 remains the only one you can actually use today.

Important context

Mythos benchmarks come from Anthropic's own system card evaluations - not independent testing. GPT-5.4 is publicly available; Mythos is restricted to Glasswing partners. These may not be apples-to-apples comparisons due to different evaluation configurations.

Mythos Wins

No GPT Data

GPT-5.4 Wins

Full Benchmark Comparison

Benchmark	Mythos	GPT-5.4	Winner
SWE-bench Verified	93.9%	80%	Mythos +13.9
SWE-bench Pro	77.8%	57.7%	Mythos +20.1
Terminal-Bench 2.0	82%	75.1%	Mythos +6.9
Terminal-Bench 2.1 (4hr)	92.1%	75.3%	Mythos +16.8
GPQA Diamond	94.6%	92.8%	Mythos +1.8
MMMLU	92.7%	--
Humanity's Last Exam (no tools)	56.8%	39.8%	Mythos +17.0
Humanity's Last Exam (with tools)	64.7%	52.1%	Mythos +12.6
USAMO 2026	97.6%	95.2%	Mythos +2.4
OSWorld	79.6%	75%	Mythos +4.6
GraphWalks BFS (256K-1M)	80%	21.4%	Mythos +58.6

Mythos scores from Anthropic system card. GPT-5.4 scores from Anthropic's comparative evaluations (not OpenAI's own reporting). Different evaluation harnesses may produce different results.

Where Mythos Pulls Ahead

GraphWalks BFS+58.6

Long-context retrieval at 256K-1M tokens

SWE-bench Pro+20.1

Real-world software engineering tasks

Terminal-Bench 2.1+16.8

Extended agentic coding (4-hour timeout)

Humanity's Last Exam+17.0

Expert-level knowledge (no tools)

Pricing & Availability

	Mythos	GPT-5.4
Input $/1M tokens	$25.00 *	~$2.50
Output $/1M tokens	$125.00 *	~$10.00
Public API	Restricted	Available
Provider	Anthropic	OpenAI
Cost ratio	Mythos is ~10-12x more expensive than GPT-5.4

* Glasswing partner pricing. Public API pricing may differ.

The Bottom Line

On paper, Claude Mythos dominates GPT-5.4 across every benchmark Anthropic tested. The long-context gap (80% vs 21.4% on GraphWalks BFS) and coding gap (77.8% vs 57.7% on SWE-bench Pro) are particularly striking. However, Mythos costs ~10x more, isn't publicly available, and these benchmarks come from Anthropic's own evaluations. For users who need a frontier model today, GPT-5.4 remains one of the best options alongside Claude Opus 4.6.

Related Comparisons

Mythos vs Opus Mythos vs Gemini OpenAI vs Anthropic Mythos Tracker Full Report

Frequently Asked Questions

Based on available benchmarks from Anthropic's system card, Mythos leads GPT-5.4 on every evaluated benchmark where both have scores, including SWE-bench Pro (77.8% vs 57.7%), USAMO 2026 (97.6% vs 95.2%), and GraphWalks BFS (80.0% vs 21.4%). However, these are Anthropic's own evaluations and have not been independently verified. GPT-5.4 may perform differently on benchmarks not covered by the system card.

No. GPT-5.4 is publicly available through OpenAI's API, while Claude Mythos is restricted to Project Glasswing cybersecurity partners. For production use, GPT-5.4 and Claude Opus 4.6 are the top publicly available options from their respective providers.

GPT-5.4 is available at approximately $2.50/$10 per million input/output tokens through OpenAI's API. Mythos is priced at $25/$125 for Glasswing partners - roughly 10x more expensive. Mythos is designed as a premium capability tier, not a price-competitive offering.

Full Benchmark Comparison

Benchmark	Mythos	GPT-5.4	Winner
SWE-bench Verified	93.9%	80%	Mythos +13.9
SWE-bench Pro	77.8%	57.7%	Mythos +20.1
Terminal-Bench 2.0	82%	75.1%	Mythos +6.9
Terminal-Bench 2.1 (4hr)	92.1%	75.3%	Mythos +16.8
GPQA Diamond	94.6%	92.8%	Mythos +1.8
MMMLU	92.7%	--
Humanity's Last Exam (no tools)	56.8%	39.8%	Mythos +17.0
Humanity's Last Exam (with tools)	64.7%	52.1%	Mythos +12.6
USAMO 2026	97.6%	95.2%	Mythos +2.4
OSWorld	79.6%	75%	Mythos +4.6
GraphWalks BFS (256K-1M)	80%	21.4%	Mythos +58.6

Mythos scores from Anthropic system card. GPT-5.4 scores from Anthropic's comparative evaluations (not OpenAI's own reporting). Different evaluation harnesses may produce different results.

Mythos

GPT-5.4

Input $/1M tokens

$25.00 *

~$2.50

Output $/1M tokens

$125.00 *

~$10.00

Public API

Restricted

Available

Provider

Anthropic

OpenAI

Cost ratio

Mythos is ~10-12x more expensive than GPT-5.4

The Bottom Line