评分分布

Q: What is the AI model score distribution?

The score distribution shows how all 290+ tracked AI models are spread across the 0-100 SignalScore scale. Most models cluster in the 40-70 range, with a small elite group scoring above 80 and budget/older models falling below 30.

Q: How is the SignalScore calculated?

SignalScore is a composite metric combining six weighted factors: benchmark performance (90%) from MMLU, GPQA, HumanEval, SWE-bench, and 15+ standardized evaluations, with capabilities and context window as tiebreakers (10%). Each factor is normalized to a 0-100 scale before weighting.

Q: What score percentile is considered good for an AI model?

Models scoring above the 75th percentile (typically 65+ SignalScore) are considered strong performers. The top 10% of models score above 78, while the median score across all tracked models sits around 52-55.

300个AI模型综合评分分布的统计分析。探索均值、中位数、百分位数和层级分布，了解AI模型格局。

关键统计

所有300个评分模型的汇总统计。

平均分

58.5

+/- 17.5 标准差

中位分

59.8

评分范围

40-92

第95百分位

87.8

高于中位数

151

共300个模型

评分分布（10分区间）

LMMarketCap.com

评分分布

每个10分区间中的模型数量。

0-10

10-20

20-30

30-40

40-50

128

50-60

60-70

70-80

80-90

90-100

Elite

Strong

Average

Below Average

Weak

评分层级

按性能层级分组的模型及汇总统计。

层级	范围	数量	占比	平均评分	顶级模型
Elite	90–100	5	1.7%	91.0	GPT-5.4 Pro(91.9)
Strong	70–89	92	30.7%	78.1	Grok 4.20(88.8)
Average	50–69	65	21.7%	61.5	Hy3 preview(69.0)
Below Average	30–49	128	42.7%	40.4	Command R+ (08-2024)(48.7)
Weak	0–29	0	0.0%	-	-

百分位分析

关键百分位数的评分阈值。

百分位	评分	位置
P5	40.0	4092
P10	40.0	4092
P25	40.0	4092
P50	59.8	4092
P75	73.0	4092
P90	82.1	4092
P95	87.8	4092

按平均分排名的服务商

拥有3+模型的服务商，按平均综合评分排名。

提供商	模型	平均评分	最佳	最差	跨度
1Anthropic	14	76.4	90.4	50.4	40.0
2xAI	11	72.0	88.8	40.0	48.8
3OpenAI	57	71.4	91.9	40.0	51.9
4Google	23	68.4	88.4	40.0	48.4
5MiniMax	8	68.3	78.2	40.0	38.2
6Zhipu AI	12	67.9	78.0	40.0	38.0
7DeepSeek	12	66.9	79.4	40.0	39.4
8Xiaomi	5	60.3	75.8	40.0	35.8
9Moonshot AI	5	58.4	75.9	51.4	24.5
10Meta	9	55.5	67.1	40.0	27.1

评分集中度

模型在前20%、中间60%和后20%评分中的分布情况。

前20%(评分 >= 75.8)

60 (20.0%)

中间60%(评分 40.0 - 75.8)

122 (40.7%)

后20%(评分 <= 40.0)

118 (39.3%)

方法论

评分的计算方式以及分布所揭示的信息。

评分计算方式

每个模型获得0到100的评分，主要基于基准测试分数（90%），来源包括Arena Elo、MMLU、GPQA、HumanEval、SWE-bench等15+标准化评估。功能和上下文窗口作为辅助排序（10%）。该评分旨在用一个数字衡量模型的科学评估质量。

分布图告诉我们什么

评分分布揭示了AI模型的竞争格局。中位数附近的紧密聚集表明有许多能力相近的模型，而分散的分布则表明层级之间有明显的差异。分布的形状、偏斜度以及均值和中位数之间的差距都能揭示市场是头重脚轻、底部沉重还是均匀分布。

探索更多

通过基准测试、功能对比和完整排行榜继续探索AI模型数据。

全部探索器基准测试排行榜

Frequently Asked Questions

The score distribution shows how all 290+ tracked AI models are spread across the 0-100 SignalScore scale. Most models cluster in the 40-70 range, with a small elite group scoring above 80 and budget/older models falling below 30.

SignalScore is a composite metric combining six weighted factors: benchmark performance (90%) from MMLU, GPQA, HumanEval, SWE-bench, and 15+ standardized evaluations, with capabilities and context window as tiebreakers (10%). Each factor is normalized to a 0-100 scale before weighting.

Models scoring above the 75th percentile (typically 65+ SignalScore) are considered strong performers. The top 10% of models score above 78, while the median score across all tracked models sits around 52-55.

层级

范围

数量

占比

Elite

90–100

1.7%

Strong

70–89

30.7%

Average

50–69

21.7%

Below Average

30–49

128

42.7%

Weak

0–29

0.0%

百分位

评分

位置

40.0

4092

P10

40.0

4092

P25

40.0

4092

P50

59.8

4092

P75

73.0

4092

P90

82.1

4092

P95

87.8

4092

按平均分排名的服务商

拥有3+模型的服务商，按平均综合评分排名。

提供商	模型	平均评分	最佳	最差	跨度
1Anthropic	14	76.4	90.4	50.4	40.0
2xAI	11	72.0	88.8	40.0	48.8
3OpenAI	57	71.4	91.9	40.0	51.9
4Google	23	68.4	88.4	40.0	48.4
5MiniMax	8	68.3	78.2	40.0	38.2
6Zhipu AI	12	67.9	78.0	40.0	38.0
7DeepSeek	12	66.9	79.4	40.0	39.4
8Xiaomi	5	60.3	75.8	40.0	35.8
9Moonshot AI	5	58.4	75.9	51.4	24.5
10Meta	9	55.5	67.1	40.0	27.1

方法论

评分的计算方式以及分布所揭示的信息。