Human preference rating from 6M+ crowdsourced blind head-to-head comparisons. Users chat with two anonymous models and pick the better response.
为什么重要: The most trusted 'vibes-based' benchmark — reflects real human preferences, not just academic metrics. Widely considered the most meaningful overall ranking.
顶级模型
1,503
Claude Opus 4.6
平均评分
1,367
共124个模型
已测试模型
124
指标: Elo rating
人类基准
-
评分范围: 900–1600
Arena Elo Scores - Top 25 Models
Ranked by Arena Elo score
All models with a reported Arena Elo score, ranked by highest Elo rating.
Arena Elo is a standardized evaluation that measures AI model performance on specific tasks. It provides comparable scores across different models, helping developers choose the right model for their needs.
Claude Opus 4.6 currently holds the top score on the Arena Elo benchmark. See our full rankings table above for the complete leaderboard with 124 models.
We update benchmark data from multiple sources including HuggingFace open-source model leaderboards and LMArena. Scores are refreshed regularly as new evaluations are published and new models are released.
No. While Arena Elo is an important indicator, real-world performance depends on many factors including pricing, latency, context window, and specific task requirements. We recommend using our composite score which weighs multiple benchmarks and practical factors.