SimpleQA Factual Accuracy 排行榜

Tests factual accuracy on simple questions from parametric knowledge, emphasizing calibration — knowing when the model doesn't know the answer.

为什么重要： GPT-4o scores below 40%, making it surprisingly challenging. Tests honesty and factual reliability, not just knowledge breadth.

顶级模型

38.2%

GPT-4o

平均评分

38.2%

共1个模型

已测试模型

指标: accuracy

人类基准

评分范围: 0%–100%

SimpleQA Scores - Top 1 Models

Ranked by SimpleQA score (%)

LMMarketCap.com

模型排名

All models with a reported SimpleQA score, ranked by highest accuracy.

排名模型评分性能

GPT-4o OpenAI

38.2%

38%

38.2%

关于 SimpleQA

全名: SimpleQA Factual Accuracy
类别: Knowledge
指标: accuracy (%)
评分范围: 0%–100%
人类基准: 尚未确定
状态: 启用

Frequently Asked Questions

SimpleQA is a standardized evaluation that measures AI model performance on specific tasks. It provides comparable scores across different models, helping developers choose the right model for their needs.

GPT-4o currently holds the top score on the SimpleQA benchmark. See our full rankings table above for the complete leaderboard with 1 models.

We update benchmark data from multiple sources including HuggingFace open-source model leaderboards and LMArena. Scores are refreshed regularly as new evaluations are published and new models are released.

No. While SimpleQA is an important indicator, real-world performance depends on many factors including pricing, latency, context window, and specific task requirements. We recommend using our composite score which weighs multiple benchmarks and practical factors.

SimpleQA Factual Accuracy 排行榜

模型排名

关于 SimpleQA

相关基准测试

SimpleQA Factual Accuracy 排行榜

模型排名

关于 SimpleQA

相关基准测试