Human preference rating from 6M+ crowdsourced blind head-to-head comparisons. Users chat with two anonymous models and pick the better response.
Why it matters: The most trusted 'vibes-based' benchmark — reflects real human preferences, not just academic metrics. Widely considered the most meaningful overall ranking.
Top Model
1,503
Claude Opus 4.6
Average Score
1,367
Across 124 models
Models Tested
124
Metric: Elo rating
Human Baseline
-
Score Range: 900–1600
Arena Elo Scores - Top 25 Models
Ranked by Arena Elo score
All models with a reported Arena Elo score, ranked by highest Elo rating.
Arena Elo is a standardized evaluation that measures AI model performance on specific tasks. It provides comparable scores across different models, helping developers choose the right model for their needs.
Claude Opus 4.6 currently holds the top score on the Arena Elo benchmark. See our full rankings table above for the complete leaderboard with 124 models.
We update benchmark data from multiple sources including HuggingFace open-source model leaderboards and LMArena. Scores are refreshed regularly as new evaluations are published and new models are released.
No. While Arena Elo is an important indicator, real-world performance depends on many factors including pricing, latency, context window, and specific task requirements. We recommend using our composite score which weighs multiple benchmarks and practical factors.