Human preference rating from 6M+ crowdsourced blind head-to-head comparisons. Users chat with two anonymous models and pick the better response.
为什么重要: The most trusted 'vibes-based' benchmark — reflects real human preferences, not just academic metrics. Widely considered the most meaningful overall ranking.
顶级模型
1,503
Claude Opus 4.6
平均评分
1,352
共102个模型
已测试模型
102
指标: Elo rating
人类基准
—
评分范围: 900–1600
All models with a reported Arena Elo score, ranked by highest Elo rating.
Arena Elo is a standardized evaluation that measures AI model performance on specific tasks. It provides comparable scores across different models, helping developers choose the right model for their needs.
Claude Opus 4.6 currently holds the top score on the Arena Elo benchmark. See our full rankings table above for the complete leaderboard with 102 models.
We update benchmark data from multiple sources including HuggingFace Open LLM Leaderboard and LMArena. Scores are refreshed regularly as new evaluations are published and new models are released.
No. While Arena Elo is an important indicator, real-world performance depends on many factors including pricing, latency, context window, and specific task requirements. We recommend using our composite score which weighs multiple benchmarks and practical factors.