Human preference rating from 6M+ crowdsourced blind head-to-head comparisons. Users chat with two anonymous models and pick the better response.
为什么重要: The most trusted 'vibes-based' benchmark — reflects real human preferences, not just academic metrics. Widely considered the most meaningful overall ranking.
顶级模型
1,508
Claude Fable 5
平均评分
1,369
共122个模型
已测试模型
122
指标: Elo rating
人类基准
-
评分范围: 900–1600
Arena Elo Scores - Top 25 Models
Ranked by Arena Elo score
All models with a reported Arena Elo score, ranked by highest Elo rating.
Arena Elo is a standardized evaluation that measures AI model performance on specific tasks. It provides comparable scores across different models, helping developers choose the right model for their needs.
Claude Fable 5 currently holds the top score on the Arena Elo benchmark. See our full rankings table above for the complete leaderboard with 122 models.
We update benchmark data from multiple sources including HuggingFace open-source model leaderboards and LMArena. Scores are refreshed regularly as new evaluations are published and new models are released.
No. While Arena Elo is an important indicator, real-world performance depends on many factors including pricing, latency, context window, and specific task requirements. We recommend using our composite score which weighs multiple benchmarks and practical factors.