AI Reasoning Benchmark

How do reasoning models stack up against standard LLMs? This benchmark compares 197 reasoning models against 149 standard models on composite score, pricing, and capabilities - helping you decide when chain-of-thought thinking is worth the trade-off.

Reasoning vs Standard - Head-to-Head

Reasoning Models

197

Models

Top Score

Avg Score

$12.56

Avg $/1M Out

Standard Models

149

Models

Top Score

Avg Score

$4.06

Avg $/1M Out

Reasoning models from 35 providers. Score difference: +12 points average for reasoning models.

Reasoning Models - Ranked by Score

#	Model	Provider	Score	Context	$/1M Out
1	GPT-5.4 ProOpenAI	OpenAI	92	1.1M	$180.00
2	GPT-5.4OpenAI	OpenAI	92	1.1M	$15.00
3	GPT-5.2 ProOpenAI	OpenAI	91	400K	$168.00
4	Claude Opus 4.6 (Fast)Anthropic	Anthropic	90	1M	$150.00
5	Claude Opus 4.6Anthropic	Anthropic	90	1M	$25.00
6	GPT-5.2-CodexOpenAI	OpenAI	90	400K	$14.00
7	GPT-5.2OpenAI	OpenAI	90	400K	$14.00
8	Grok 4.20xAI	xAI	89	2M	$2.50
9	GPT-5.3-CodexOpenAI	OpenAI	89	400K	$14.00
10	GPT-5 ProOpenAI	OpenAI	89	400K	$120.00
11	Gemini 3 Flash PreviewGoogle	Google	88	1.0M	$3.00
12	Grok 4xAI	xAI	88	256K	$15.00
13	Grok 4.20 Multi-AgentxAI	xAI	88	2M	$6.00
14	GPT-5.1-Codex-MaxOpenAI	OpenAI	88	400K	$10.00
15	GPT-5 CodexOpenAI	OpenAI	88	400K	$10.00
16	GPT-5OpenAI	OpenAI	88	400K	$10.00
17	GPT-5.1OpenAI	OpenAI	87	400K	$10.00
18	GPT-5.1-CodexOpenAI	OpenAI	87	400K	$10.00
19	GPT-5.1-Codex-MiniOpenAI	OpenAI	87	400K	$2.00
20	o3 Deep ResearchOpenAI	OpenAI	87	200K	$40.00
21	o3 ProOpenAI	OpenAI	87	200K	$80.00
22	o3OpenAI	OpenAI	87	200K	$8.00
23	Claude Sonnet 4.6Anthropic	Anthropic	85	1M	$15.00
24	Claude Opus 4.5Anthropic	Anthropic	85	200K	$25.00
25	Gemini 2.5 ProGoogle	Google	84	1.0M	$10.00
26	Gemini 2.5 Pro Preview 06-05Google	Google	84	1.0M	$10.00
27	Gemini 2.5 Pro Preview 05-06Google	Google	84	1.0M	$10.00
28	Claude Sonnet 4.5Anthropic	Anthropic	82	1M	$15.00
29	Claude Opus 4Anthropic	Anthropic	82	200K	$75.00
30	o4 Mini Deep ResearchOpenAI	OpenAI	81	200K	$8.00
31	o4 MiniOpenAI	OpenAI	81	200K	$4.40
32	Gemini 3.1 Pro Preview Custom ToolsGoogle	Google	81	1.0M	$12.00
33	Gemini 3.1 Pro PreviewGoogle	Google	81	1.0M	$12.00
34	Gemma 4 31B (free)Google	Google	81	262K	Free
35	Gemma 4 31BGoogle	Google	81	262K	$0.38
36	Gemini 3.1 Flash Lite PreviewGoogle	Google	80	1.0M	$1.50
37	Qwen3.5 397B A17BAlibaba	Alibaba	80	262K	$2.34
38	R1 0528DeepSeek	DeepSeek	79	164K	$2.15
39	Claude Opus 4.7Anthropic	Anthropic	79	1M	$25.00
40	GPT-5.4 NanoOpenAI	OpenAI	79	400K	$1.25
41	GPT-5.4 MiniOpenAI	OpenAI	79	400K	$4.50
42	Gemini 2.5 Flash Lite Preview 09-2025Google	Google	79	1.0M	$0.40
43	Gemini 2.5 Flash LiteGoogle	Google	79	1.0M	$0.40
44	Gemini 2.5 FlashGoogle	Google	79	1.0M	$2.50
45	GPT-5.5 ProOpenAI	OpenAI	79	1.1M	$180.00
46	GPT-5.5OpenAI	OpenAI	79	1.1M	$30.00
47	MiniMax M2.5 (free)MiniMax	MiniMax	78	197K	Free
48	MiniMax M2.5MiniMax	MiniMax	78	197K	$1.15
49	GLM 5Zhipu AI	Zhipu AI	78	203K	$1.92
50	Grok 4.1 FastxAI	xAI	78	2M	$0.50
51	Qwen3.5-122B-A10BAlibaba	Alibaba	78	262K	$2.08
52	Qwen3.5-27BAlibaba	Alibaba	77	262K	$1.56
53	Grok 4.3xAI	xAI	76	1M	$2.50
54	GLM 5.1Zhipu AI	Zhipu AI	76	203K	$3.50
55	Qwen3.5-35B-A3BAlibaba	Alibaba	76	262K	$1.00
56	Kimi K2.6Moonshot AI	Moonshot AI	76	262K	$3.50
57	MiMo-V2.5-ProXiaomi	Xiaomi	76	1.0M	$3.00
58	DeepSeek V4 ProDeepSeek	DeepSeek	76	1.0M	$0.87
59	Claude Opus 4.1Anthropic	Anthropic	75	200K	$75.00
60	GLM 4.5Zhipu AI	Zhipu AI	75	131K	$2.20
61	Qwen3.6 PlusAlibaba	Alibaba	75	1M	$1.95
62	Claude 3.7 Sonnet (thinking)Anthropic	Anthropic	75	200K	$15.00
63	Qwen3.6 Max PreviewAlibaba	Alibaba	75	262K	$6.24
64	o3 MiniOpenAI	OpenAI	75	200K	$4.40
65	Claude Sonnet 4Anthropic	Anthropic	74	1M	$15.00
66	MiMo-V2-ProXiaomi	Xiaomi	74	1.0M	$3.00
67	o1OpenAI	OpenAI	74	200K	$60.00
68	Gemma 4 26B A4B (free)Google	Google	73	262K	Free
69	Gemma 4 26B A4B Google	Google	73	262K	$0.33
70	R1DeepSeek	DeepSeek	73	64K	$2.50
71	GLM 4.7Zhipu AI	Zhipu AI	73	203K	$1.75
72	o1-proOpenAI	OpenAI	73	200K	$600.00
73	Claude 3.7 SonnetAnthropic	Anthropic	73	200K	$15.00
74	Grok 4 FastxAI	xAI	73	2M	$0.50
75	DeepSeek V4 FlashDeepSeek	DeepSeek	72	1.0M	$0.28
76	o4 Mini HighOpenAI	OpenAI	72	200K	$4.40
77	MiniMax M2MiniMax	MiniMax	72	197K	$1.00
78	MiMo-V2.5Xiaomi	Xiaomi	72	1.0M	$2.00
79	GLM 5 TurboZhipu AI	Zhipu AI	71	203K	$4.00
80	MiniMax M1MiniMax	MiniMax	71	1M	$2.20
81	GLM 4.6Zhipu AI	Zhipu AI	71	205K	$1.90
82	GLM 4.5 Air (free)Zhipu AI	Zhipu AI	71	131K	Free
83	GLM 4.5 AirZhipu AI	Zhipu AI	71	131K	$0.85
84	DeepSeek V3.2DeepSeek	DeepSeek	70	131K	$0.38
85	DeepSeek V3.2 ExpDeepSeek	DeepSeek	70	164K	$0.41
86	MiniMax M2.1MiniMax	MiniMax	70	197K	$0.95
87	Claude Haiku 4.5Anthropic	Anthropic	70	200K	$5.00
88	DeepSeek V3.1 TerminusDeepSeek	DeepSeek	69	164K	$0.95
89	DeepSeek V3.1DeepSeek	DeepSeek	69	33K	$0.75
90	Hy3 previewTencent	Tencent	69	262K	$0.26
91	Qwen3.5-FlashAlibaba	Alibaba	69	1M	$0.26
92	MiniMax M2.7MiniMax	MiniMax	68	197K	$1.20
93	Qwen3 Max ThinkingAlibaba	Alibaba	68	262K	$3.90
94	Qwen3 VL 235B A22B ThinkingAlibaba	Alibaba	68	131K	$2.60
95	Step 3.5 FlashStepFun	StepFun	67	262K	$0.30
96	Qwen3.5-9BAlibaba	Alibaba	67	262K	$0.15
97	Composer 2Cursor	Cursor	66	200K	$2.50
98	Composer 2 FastCursor	Cursor	66	200K	$7.50
99	Qwen3 235B A22B Thinking 2507Alibaba	Alibaba	65	131K	$1.50
100	Trinity Large Thinkingarcee-ai	arcee-ai	65	262K	$0.85
101	GLM 4.6VZhipu AI	Zhipu AI	65	131K	$0.90
102	Qwen3 30B A3B Thinking 2507Alibaba	Alibaba	64	131K	$0.40
103	GPT-5 MiniOpenAI	OpenAI	64	400K	$2.00
104	GLM 4.7 FlashZhipu AI	Zhipu AI	64	203K	$0.40
105	Qwen3 Next 80B A3B ThinkingAlibaba	Alibaba	64	131K	$0.78
106	Qwen3 30B A3BAlibaba	Alibaba	64	41K	$0.45
107	Grok 3 Mini BetaxAI	xAI	63	131K	$0.50
108	o3 Mini HighOpenAI	OpenAI	63	200K	$4.40
109	GLM 4.5VZhipu AI	Zhipu AI	62	66K	$1.80
110	Mercury 2Inception	Inception	61	128K	$0.75
111	Llama 3.3 Nemotron Super 49B V1.5NVIDIA	NVIDIA	61	131K	$0.40
112	Qwen3 8BAlibaba	Alibaba	61	41K	$0.40
113	Nova 2 LiteAmazon	Amazon	61	1M	$2.50
114	Kimi K2.5Moonshot AI	Moonshot AI	59	262K	$2.00
115	gpt-oss-20bOpenAI	OpenAI	57	131K	$0.14
116	gpt-oss-20b (free)OpenAI	OpenAI	57	131K	Free
117	Olmo 3 32B ThinkAllen AI	Allen AI	55	66K	$0.50
118	Qwen3 235B A22BAlibaba	Alibaba	54	131K	$1.82
119	Kimi K2 ThinkingMoonshot AI	Moonshot AI	53	262K	$2.50
120	Grok 3 MinixAI	xAI	51	131K	$0.50
121	GPT-5 NanoOpenAI	OpenAI	46	400K	$0.40
122	R1 Distill Llama 70BDeepSeek	DeepSeek	42	131K	$0.80
123	gpt-oss-120bOpenAI	OpenAI	41	131K	$0.18
124	Ring-2.6-1T (free)inclusionai	inclusionai	40	262K	Free
125	CoBuddy (free)Baidu	Baidu	40	131K	Free
126	Mistral Medium 3.5Mistral AI	Mistral AI	40	262K	$7.50
127	Nemotron 3 Nano Omni (free)NVIDIA	NVIDIA	40	256K	Free
128	Laguna XS.2 (free)poolside	poolside	40	131K	Free
129	Laguna M.1 (free)poolside	poolside	40	131K	Free
130	Anthropic Claude Haiku Latest~anthropic	~anthropic	40	200K	$5.00
131	OpenAI GPT Mini Latest~openai	~openai	40	400K	$4.50
132	Google Gemini Pro Latest~google	~google	40	1.0M	$12.00
133	MoonshotAI Kimi Latest~moonshotai	~moonshotai	40	262K	$3.50
134	Google Gemini Flash Latest~google	~google	40	1.0M	$3.00
135	Anthropic Claude Sonnet Latest~anthropic	~anthropic	40	1M	$15.00
136	OpenAI GPT Latest~openai	~openai	40	1.1M	$30.00
137	Qwen3.5 Plus 2026-04-20Alibaba	Alibaba	40	1M	$2.40
138	Qwen3.6 FlashAlibaba	Alibaba	40	1M	$1.50
139	Qwen3.6 35B A3BAlibaba	Alibaba	40	262K	$1.00
140	Qwen3.6 27BAlibaba	Alibaba	40	262K	$3.20
141	Claude Opus Latest~anthropic	~anthropic	40	1M	$25.00
142	Qianfan-OCR-Fast (free)Baidu	Baidu	40	66K	Free
143	GLM 5V TurboZhipu AI	Zhipu AI	40	203K	$4.00
144	MiMo-V2-OmniXiaomi	Xiaomi	40	262K	$2.00
145	Mistral Small 4Mistral AI	Mistral AI	40	262K	$0.60
146	Nemotron 3 Super (free)NVIDIA	NVIDIA	40	262K	Free
147	Nemotron 3 SuperNVIDIA	NVIDIA	40	262K	$0.45
148	Seed-2.0-LiteByteDance	ByteDance	40	262K	$2.00
149	Seed-2.0-MiniByteDance	ByteDance	40	262K	$0.40
150	Aion-2.0aion-labs	aion-labs	40	131K	$1.60
151	Qwen3.5 Plus 2026-02-15Alibaba	Alibaba	40	1M	$1.56
152	Solar Pro 3Upstage	Upstage	40	128K	$0.60
153	LFM2.5-1.2B-Thinking (free)Liquid AI	Liquid AI	40	33K	Free
154	Seed 1.6 FlashByteDance	ByteDance	40	262K	$0.30
155	Seed 1.6ByteDance	ByteDance	40	262K	$2.00
156	MiMo-V2-FlashXiaomi	Xiaomi	40	262K	$0.30
157	Nemotron 3 Nano 30B A3B (free)NVIDIA	NVIDIA	40	256K	Free
158	Nemotron 3 Nano 30B A3BNVIDIA	NVIDIA	40	262K	$0.20
159	Trinity Miniarcee-ai	arcee-ai	40	131K	$0.15
160	DeepSeek V3.2 SpecialeDeepSeek	DeepSeek	40	164K	$0.43
161	Cogito v2.1 671Bdeepcogito	deepcogito	40	128K	$1.25
162	Sonar Pro SearchPerplexity	Perplexity	40	200K	$15.00
163	gpt-oss-safeguard-20bOpenAI	OpenAI	40	131K	$0.30
164	Nemotron Nano 12B 2 VL (free)NVIDIA	NVIDIA	40	128K	Free
165	Qwen3 VL 8B ThinkingAlibaba	Alibaba	40	131K	$1.36
166	ERNIE 4.5 21B A3B ThinkingBaidu	Baidu	40	131K	$0.28
167	Qwen3 VL 30B A3B ThinkingAlibaba	Alibaba	40	131K	$1.56
168	Tongyi DeepResearch 30B A3BAlibaba	Alibaba	40	131K	$0.45
169	Qwen Plus 0728 (thinking)Alibaba	Alibaba	40	1M	$0.78
170	Nemotron Nano 9B V2 (free)NVIDIA	NVIDIA	40	128K	Free
171	Nemotron Nano 9B V2NVIDIA	NVIDIA	40	131K	$0.16
172	Grok Code Fast 1xAI	xAI	40	256K	$1.50
173	ERNIE 4.5 VL 28B A3BBaidu	Baidu	40	30K	$0.56
174	Hunyuan A13B InstructTencent	Tencent	40	131K	$0.57
175	ERNIE 4.5 VL 424B A47B Baidu	Baidu	40	123K	$1.25
176	Qwen3 14BAlibaba	Alibaba	40	41K	$0.24
177	Qwen3 32BAlibaba	Alibaba	40	41K	$0.28
178	Reka Flash 3rekaai	rekaai	40	66K	$0.20
179	Sonar Reasoning ProPerplexity	Perplexity	40	128K	$8.00
180	Sonar Deep ResearchPerplexity	Perplexity	40	128K	$8.00
181	Aion-1.0aion-labs	aion-labs	40	131K	$8.00
182	Aion-1.0-Miniaion-labs	aion-labs	40	131K	$1.40
183	Gemini 3.1 Flash LiteGoogle	Google	-	1.0M	$1.50
184	GPT-5.4 Image 2OpenAI	OpenAI	-	272K	$15.00
185	Nano Banana 2 (Gemini 3.1 Flash Image Preview)Google	Google	-	66K	$3.00
186	Nano Banana Pro (Gemini 3 Pro Image Preview)Google	Google	-	66K	$12.00
187	GPT-5 Image MiniOpenAI	OpenAI	-	400K	$2.00
188	GPT-5 ImageOpenAI	OpenAI	-	400K	$10.00
189	gpt-oss-120b (free)OpenAI	OpenAI	-	131K	Free
190	R1 Distill Qwen 32BDeepSeek	DeepSeek	-	33K	$0.29
191	SWE-1.5Windsurf	Windsurf	-	0	Free
192	Falcon-H1-Arabic 34B InstructTII	TII	-	262K	Free
193	Falcon-H1-Arabic 7B InstructTII	TII	-	262K	Free
194	Falcon-H1-Arabic 3B InstructTII	TII	-	131K	Free
195	Falcon Arabic 7B InstructTII	TII	-	33K	Free
196	Falcon3 10B InstructTII	TII	-	33K	Free
197	Falcon3 7B InstructTII	TII	-	33K	Free

Top Standard Models (Non-Reasoning) - For Comparison

#	Model	Provider	Score	Context	$/1M Out
1	GPT-5.3 ChatOpenAI	OpenAI	87	128K	$14.00
2	GPT-5.1 ChatOpenAI	OpenAI	87	128K	$10.00
3	Gemma 2 27BGoogle	Google	77	8K	$0.65
4	GPT-5.2 ChatOpenAI	OpenAI	77	128K	$14.00
5	Grok 3xAI	xAI	74	131K	$15.00
6	Grok 3 BetaxAI	xAI	74	131K	$15.00
7	Gemini 2.0 FlashGoogle	Google	72	1M	$0.40
8	DeepSeek V3 0324DeepSeek	DeepSeek	72	164K	$0.77
9	GPT-4o (2024-08-06)OpenAI	OpenAI	71	128K	$10.00
10	GPT-4o (2024-05-13)OpenAI	OpenAI	71	128K	$15.00

Understanding AI Reasoning Benchmarks

What Is Chain-of-Thought?

Chain-of-thought (CoT) prompting enables AI models to break down complex problems into intermediate steps before producing a final answer. Models like OpenAI o1 and DeepSeek R1 internalize this process, generating hidden reasoning traces that dramatically improve accuracy on math, logic, and multi-step tasks compared to direct answering.

When Reasoning Helps

Reasoning models shine on tasks that require multiple logical steps: mathematical proofs, complex coding challenges, scientific analysis, strategic planning, and any problem where standard models tend to hallucinate or skip steps. For simple Q&A or creative writing, standard models are often faster and equally effective.

Speed vs Accuracy

Reasoning models consume more tokens and take longer to respond because they generate internal thinking traces. This trade-off is worthwhile when correctness matters more than latency - for example in code generation, financial analysis, or exam-style problems. For real-time chat, standard models remain the better choice.

Emerging Reasoning Models

The reasoning model landscape is evolving rapidly. OpenAI's o1 and o3 series led the way, followed by DeepSeek R1 bringing open-source reasoning. Google, Anthropic, and other providers have since introduced their own reasoning-capable models, driving down costs and expanding access to chain-of-thought capabilities.

AI Reasoning Benchmark

Reasoning vs Standard - Head-to-Head

Reasoning Models - Ranked by Score

Top Standard Models (Non-Reasoning) - For Comparison

Understanding AI Reasoning Benchmarks

What Is Chain-of-Thought?

When Reasoning Helps

Speed vs Accuracy

Emerging Reasoning Models

相关页面

AI Reasoning Benchmark

Reasoning vs Standard - Head-to-Head

Reasoning Models - Ranked by Score

Top Standard Models (Non-Reasoning) - For Comparison

Understanding AI Reasoning Benchmarks

What Is Chain-of-Thought?

When Reasoning Helps

Speed vs Accuracy

Emerging Reasoning Models

相关页面