Arena Leaderboards

Last updated: 58m ago

This page is built for queries like “LLM arena”, “Arena Elo leaderboard”, and “AI arena rankings”, bringing the most relevant head-to-head and fast-moving evaluation surfaces into one landing page.

Arena Elo models

LiveBench models

Providers in arena top 10

Open-source arena leaders

Arena Elo

Human preference rating from 6M+ crowdsourced blind head-to-head comparisons. Users chat with two anonymous models and pick the better response.

#	Model	Provider	Score
#1	Claude Opus 4.6	Anthropic	1503
#2	Gemini 3.1 Pro Preview	Google	1494
#3	Gemini 3.1 Pro	Google	1494
#4	Claude Opus 4.7	Anthropic	1491
#5	Gemini 3 Pro	Google	1486
#6	GPT-5.4	OpenAI	1485
#7	GPT-5.2	OpenAI	1481
#8	GPT-5.2 Chat	OpenAI	1477
#9	GPT-5.1	OpenAI	1475
#10	GPT-5.5	OpenAI	1475
#11	Gemini 3 Flash	Google	1474
#12	GLM 5.1 Open	Zhipu AI	1471
#13	Grok 4.1 Fast	xAI	1467
#14	GPT-5	OpenAI	1465
#15	MiMo-V2.5-Pro Open	Xiaomi	1465

LiveBench

Comprehensive benchmark across 6 categories (math, coding, reasoning, data analysis, instruction following, language) using contamination-resistant, regularly updated questions.

1. o4 Mini High

OpenAI

87.3%

2. R1 0528

DeepSeek

84.4%

3. Qwen3 235B A22B

Alibaba

80.4%

4. Claude 3.5 Sonnet

Anthropic

80.0%

5. Gemini 2.0 Flash

Google

80.0%

6. Gemma 4 31B

Google

80.0%

7. GPT-5.4

OpenAI

79.0%

8. Claude Opus 4.6

Anthropic

78.5%

9. Gemini 3 Pro

Google

78.0%

10. GPT-5.2

OpenAI

77.5%

Top Provider Share

OpenAI

Google

Anthropic

Related Arena Surfaces

Arena Elo detail page LiveBench detail page Benchmark hub LLM leaderboard Compare models

Frequently Asked Questions

Arena-style rankings emphasize live head-to-head or fast-moving evaluation surfaces, while traditional benchmarks are more like fixed test suites. Together they give a better picture of actual competition.

Yes. This page reads directly from the aggregated real Arena Elo and LiveBench scores and only shows models that exist in the current local benchmark dataset.

Because queries like “llm arena” and “arena elo leaderboard” are different from generic benchmark intent. Users want the competitive arena surfaces quickly, not the full benchmark matrix.