Harder version of MMLU with reasoning-focused questions and 10 answer choices instead of 4. Contains 12,000+ questions across 14 domains.
为什么重要: Better at differentiating top models since scores are 16-33% lower than standard MMLU. Tests reasoning in addition to knowledge.
顶级模型
88%
Gemini 3 Pro
平均评分
70.9%
共82个模型
已测试模型
82
指标: accuracy
人类基准
-
评分范围: 0%–100%
MMLU-Pro Scores - Top 25 Models
Ranked by MMLU-Pro score (%)
All models with a reported MMLU-Pro score, ranked by highest accuracy.
MMLU-Pro is a standardized evaluation that measures AI model performance on specific tasks. It provides comparable scores across different models, helping developers choose the right model for their needs.
Gemini 3 Pro currently holds the top score on the MMLU-Pro benchmark. See our full rankings table above for the complete leaderboard with 82 models.
We update benchmark data from multiple sources including HuggingFace open-source model leaderboards and LMArena. Scores are refreshed regularly as new evaluations are published and new models are released.
No. While MMLU-Pro is an important indicator, real-world performance depends on many factors including pricing, latency, context window, and specific task requirements. We recommend using our composite score which weighs multiple benchmarks and practical factors.