This page is built for queries like “LLM arena”, “Arena Elo leaderboard”, and “AI arena rankings”, bringing the most relevant head-to-head and fast-moving evaluation surfaces into one landing page.
15
Arena Elo models
10
LiveBench models
4
Providers in arena top 10
0
Open-source arena leaders
Human preference rating from 6M+ crowdsourced blind head-to-head comparisons. Users chat with two anonymous models and pick the better response.
| # | Model | Provider | Score |
|---|---|---|---|
| #1 | Claude Opus 4.6 | Anthropic | 1503 |
| #2 | Gemini 3 Pro | 1486 | |
| #3 | GPT-5.4 | OpenAI | 1485 |
| #4 | GPT-5.2 | OpenAI | 1481 |
| #5 | Gemini 3 Flash | 1474 | |
| #6 | GPT-5 | OpenAI | 1465 |
| #7 | Grok 4 | xAI | 1462 |
| #8 | Claude Sonnet 4.6 | Anthropic | 1460 |
| #9 | Claude Sonnet 4.5 | Anthropic | 1452 |
| #10 | Gemini 2.5 Pro | 1444 | |
| #11 | Claude Opus 4.5 | Anthropic | 1430 |
| #12 | Claude Opus 4 | Anthropic | 1420 |
| #13 | o3 | OpenAI | 1415 |
| #14 | Gemini 2.5 Flash | 1395 | |
| #15 | Claude Sonnet 4 | Anthropic | 1387 |
Comprehensive benchmark across 6 categories (math, coding, reasoning, data analysis, instruction following, language) using contamination-resistant, regularly updated questions.
Anthropic
3OpenAI
3xAI
1Arena-style rankings emphasize live head-to-head or fast-moving evaluation surfaces, while traditional benchmarks are more like fixed test suites. Together they give a better picture of actual competition.
Yes. This page reads directly from the aggregated real Arena Elo and LiveBench scores and only shows models that exist in the current local benchmark dataset.
Because queries like “llm arena” and “arena elo leaderboard” are different from generic benchmark intent. Users want the competitive arena surfaces quickly, not the full benchmark matrix.