AI models ranked by reasoning ability using GPQA, ARC-Challenge, BIG-Bench Hard, and Humanity's Last Exam scores.
GPT-4o
Score: 76.5
47.4
Across all ranked models
47
With benchmark data
Top Best for Reasoning Models by Weighted Score
Top 15 models by weighted score
Benchmark Breakdown
Per-benchmark scores for top 10 models
Each model's score is a weighted average of its available benchmark results. When a model is missing some benchmarks, the weights are re-normalized across the benchmarks that are available. All scores are on a 0-100 scale. Data sourced from official model cards, published papers, and third-party evaluation platforms.
Based on our benchmark analysis, GPT-4o by OpenAI is currently the #1 ranked model for reasoning, with a weighted score of 76.5/100.
Models are ranked using a weighted average of GPQA, ARC-Challenge, BIG-Bench Hard, Humanity's Last Exam benchmark scores. All scores are normalized to a 0-100 scale.
We currently rank 47 models that have relevant benchmark data for reasoning tasks.