AI模型排行榜总览

所有实时排行榜的入口: 主排行榜覆盖345个模型, 另有8个专项榜单和12个工具榜单, 数据每小时刷新。

主排行榜

全部345个模型的综合排名

SWE-bench, HumanEval and BigCodeBench weighted ranking

MATH-500, GSM8K and AIME 2024 composite

GPQA Diamond and multi-step logic benchmarks

Long-form quality and instruction adherence

IFEval-driven strictness scores

Tabular reasoning and code-interpreter tasks

Character consistency and creative dialogue

Cross-language benchmarks beyond English

开放权重模型的独立排名