排名置信度探索器

分析我们对每个模型排名的置信度。排名范围展示模型可能持有的位置范围，置信度水平表示排名精确度，稳定性状态反映随时间的一致性。

置信度水平分布

LMMarketCap.com

置信度概览

全部300个模型的排名置信度概览。

高置信度

300

100.0% of models

中等置信度

0.0% of models

低置信度

0.0% of models

平均排名跨度

4.0

位的不确定性

置信度分布

按置信度水平分类的模型细分，包括评分、范围和排名的平均值。

置信度级别	数量	%	平均评分	平均跨度	Avg Rank
High	300	100.0%	57.9	4.0	151
Medium	0	0.0%	—	—	—
Low	0	0.0%	—	—	—

排名最精确的

排名范围最窄的模型。这些是我们最有信心的排名。

#	模型	提供商	评分	排名	跨度	置信度	状态
1	Claude Fable 5	Anthropic	96.6	1	±2	High	Stable
2	Claude Opus 4.7 (Fast)	Anthropic	94.7	2	±3	High	Stable
3	Claude Opus 4.7	Anthropic	94.7	3	±4	High	Stable
4	Claude Opus 4.8 (Fast)	Anthropic	94.2	4	±4	High	Stable
5	Claude Opus 4.8	Anthropic	94.2	5	±4	High	Stable
6	GPT-5.5	OpenAI	92.2	6	±4	High	Stable
7	Gemini 3.1 Pro Preview Custom Tools	Google	91.7	7	±4	High	Stable
8	Gemini 3.1 Pro Preview	Google	91.7	8	±4	High	Stable
9	GPT-5.4 Pro	OpenAI	91.5	9	±4	High	Stable
10	GPT-5.4	OpenAI	91.5	10	±4	High	Stable
11	GPT-5.5 Pro	OpenAI	90.3	11	±4	High	Stable
12	GPT-5.2-Codex	OpenAI	90.1	12	±4	High	Stable
13	GPT-5.2 Pro	OpenAI	90.1	13	±4	High	Stable
14	GPT-5.2	OpenAI	90.1	14	±4	High	Stable
15	Claude Opus 4.6 (Fast)	Anthropic	90.0	15	±4	High	Stable
16	Claude Opus 4.6	Anthropic	90.0	16	±4	High	Stable
17	Grok 4.20	xAI	88.3	17	±4	High	Stable
18	GPT-5.3-Codex	OpenAI	88.2	18	±4	High	Stable
19	GPT-5 Pro	OpenAI	88.2	19	±4	High	Stable
20	GPT-5 Codex	OpenAI	88.2	20	±4	High	Stable

排名最不确定的

排名范围最宽的模型。这些模型在细微变化下可能排名差异很大。

#	模型	提供商	评分	排名	跨度	置信度	状态
1	Claude Opus 4.7	Anthropic	94.7	3	±4	High	Stable
2	Claude Opus 4.8 (Fast)	Anthropic	94.2	4	±4	High	Stable
3	Claude Opus 4.8	Anthropic	94.2	5	±4	High	Stable
4	GPT-5.5	OpenAI	92.2	6	±4	High	Stable
5	Gemini 3.1 Pro Preview Custom Tools	Google	91.7	7	±4	High	Stable
6	Gemini 3.1 Pro Preview	Google	91.7	8	±4	High	Stable
7	GPT-5.4 Pro	OpenAI	91.5	9	±4	High	Stable
8	GPT-5.4	OpenAI	91.5	10	±4	High	Stable
9	GPT-5.5 Pro	OpenAI	90.3	11	±4	High	Stable
10	GPT-5.2-Codex	OpenAI	90.1	12	±4	High	Stable
11	GPT-5.2 Pro	OpenAI	90.1	13	±4	High	Stable
12	GPT-5.2	OpenAI	90.1	14	±4	High	Stable
13	Claude Opus 4.6 (Fast)	Anthropic	90.0	15	±4	High	Stable
14	Claude Opus 4.6	Anthropic	90.0	16	±4	High	Stable
15	Grok 4.20	xAI	88.3	17	±4	High	Stable
16	GPT-5.3-Codex	OpenAI	88.2	18	±4	High	Stable
17	GPT-5 Pro	OpenAI	88.2	19	±4	High	Stable
18	GPT-5 Codex	OpenAI	88.2	20	±4	High	Stable
19	GPT-5	OpenAI	88.2	21	±4	High	Stable
20	Gemini 3 Flash Preview	Google	88.0	22	±4	High	Stable

状态 × 置信度矩阵

置信度水平与稳定性状态的交叉表。最佳组合是高置信度+稳定；最差是低置信度+脆弱。

置信度	Stable	Fragile	Preliminary
High	297	1	2
Medium	0	0	0
Low	0	0	0

排名跨度可视化

前30个模型的排名不确定性可视化表示。条形显示90%置信度下的可能排名范围；标记显示实际排名。

#1Claude Fable 5

1–3

#2Claude Opus 4.7 (Fast)

1–4

#3Claude Opus 4.7

1–5

#4Claude Opus 4.8 (Fast)

2–6

#5Claude Opus 4.8

3–7

#6GPT-5.5

4–8

#7Gemini 3.1 Pro Preview Custom Tools

5–9

#8Gemini 3.1 Pro Preview

6–10

#9GPT-5.4 Pro

7–11

#10GPT-5.4

8–12

#11GPT-5.5 Pro

9–13

#12GPT-5.2-Codex

10–14

#13GPT-5.2 Pro

11–15

#14GPT-5.2

12–16

#15Claude Opus 4.6 (Fast)

13–17

#16Claude Opus 4.6

14–18

#17Grok 4.20

15–19

#18GPT-5.3-Codex

16–20

#19GPT-5 Pro

17–21

#20GPT-5 Codex

18–22

#21GPT-5

19–23

#22Gemini 3 Flash Preview

20–24

#23Grok 4.20 Multi-Agent

21–25

#24GPT-5.1-Codex-Max

22–26

#25GPT-5.1

23–27

#26GPT-5.1-Codex

24–28

#27GPT-5.1-Codex-Mini

25–29

#28GPT-5.3 Chat

26–30

#29o3 Deep Research

27–31

#30o3 Pro

28–32

排名 1排名 32

影响置信度的因素

排名置信度如何确定以及各指标的含义。

排名波动

通过评分管道的自助重采样计算。通过运行数千次带有微小变化的模拟，我们确定每个模型可能实际持有的排名范围。范围代表90%的置信区间：在十次中有九次，模型的真实排名在此范围内。

置信度级别

由排名范围宽度得出。范围较窄（不确定性小）的模型获得高置信度，意味着其排名位置是可靠的。较宽的范围表示中等或低置信度，模型的位置可能在不同权重或数据更新下发生显著变化。

状态

基于性能指标随时间一致性的稳定性分类。"稳定"模型显示一致的排名，"保持"模型在一定波动下维持位置，"脆弱"模型容易发生排名变化，"初步"模型缺乏足够的数据历史来评估稳定性。

探索更多

继续使用其他探索器和追踪器探索AI模型数据。

全部探索器信号稳定性追踪器

Frequently Asked Questions

Ranking confidence is calculated using bootstrap resampling - a statistical technique that re-runs the ranking process thousands of times with slight variations to see how stable each model's position is. Models with narrow rank spreads have high confidence, while those with wide spreads have uncertain rankings.

Rank spread is the range between a model's best and worst possible rank across bootstrap simulations. A rank spread of 2 means the model might move 1 position up or down, while a spread of 20 means its true ranking is quite uncertain.

Low confidence usually means the model scores are clustered closely together with many competitors, making the exact ordering sensitive to small measurement differences. Models in the middle of the leaderboard tend to have wider rank spreads than those at the very top or bottom.

置信度级别

数量

High

300

100.0%

Medium

0.0%

Low

0.0%

排名最精确的

排名范围最窄的模型。这些是我们最有信心的排名。

#	模型	提供商	评分	排名	跨度	置信度	状态
1	Claude Fable 5	Anthropic	96.6	1	±2	High	Stable
2	Claude Opus 4.7 (Fast)	Anthropic	94.7	2	±3	High	Stable
3	Claude Opus 4.7	Anthropic	94.7	3	±4	High	Stable
4	Claude Opus 4.8 (Fast)	Anthropic	94.2	4	±4	High	Stable
5	Claude Opus 4.8	Anthropic	94.2	5	±4	High	Stable
6	GPT-5.5	OpenAI	92.2	6	±4	High	Stable
7	Gemini 3.1 Pro Preview Custom Tools	Google	91.7	7	±4	High	Stable
8	Gemini 3.1 Pro Preview	Google	91.7	8	±4	High	Stable
9	GPT-5.4 Pro	OpenAI	91.5	9	±4	High	Stable
10	GPT-5.4	OpenAI	91.5	10	±4	High	Stable
11	GPT-5.5 Pro	OpenAI	90.3	11	±4	High	Stable
12	GPT-5.2-Codex	OpenAI	90.1	12	±4	High	Stable
13	GPT-5.2 Pro	OpenAI	90.1	13	±4	High	Stable
14	GPT-5.2	OpenAI	90.1	14	±4	High	Stable
15	Claude Opus 4.6 (Fast)	Anthropic	90.0	15	±4	High	Stable
16	Claude Opus 4.6	Anthropic	90.0	16	±4	High	Stable
17	Grok 4.20	xAI	88.3	17	±4	High	Stable
18	GPT-5.3-Codex	OpenAI	88.2	18	±4	High	Stable
19	GPT-5 Pro	OpenAI	88.2	19	±4	High	Stable
20	GPT-5 Codex	OpenAI	88.2	20	±4	High	Stable

排名最不确定的

排名范围最宽的模型。这些模型在细微变化下可能排名差异很大。

#	模型	提供商	评分	排名	跨度	置信度	状态
1	Claude Opus 4.7	Anthropic	94.7	3	±4	High	Stable
2	Claude Opus 4.8 (Fast)	Anthropic	94.2	4	±4	High	Stable
3	Claude Opus 4.8	Anthropic	94.2	5	±4	High	Stable
4	GPT-5.5	OpenAI	92.2	6	±4	High	Stable
5	Gemini 3.1 Pro Preview Custom Tools	Google	91.7	7	±4	High	Stable
6	Gemini 3.1 Pro Preview	Google	91.7	8	±4	High	Stable
7	GPT-5.4 Pro	OpenAI	91.5	9	±4	High	Stable
8	GPT-5.4	OpenAI	91.5	10	±4	High	Stable
9	GPT-5.5 Pro	OpenAI	90.3	11	±4	High	Stable
10	GPT-5.2-Codex	OpenAI	90.1	12	±4	High	Stable
11	GPT-5.2 Pro	OpenAI	90.1	13	±4	High	Stable
12	GPT-5.2	OpenAI	90.1	14	±4	High	Stable
13	Claude Opus 4.6 (Fast)	Anthropic	90.0	15	±4	High	Stable
14	Claude Opus 4.6	Anthropic	90.0	16	±4	High	Stable
15	Grok 4.20	xAI	88.3	17	±4	High	Stable
16	GPT-5.3-Codex	OpenAI	88.2	18	±4	High	Stable
17	GPT-5 Pro	OpenAI	88.2	19	±4	High	Stable
18	GPT-5 Codex	OpenAI	88.2	20	±4	High	Stable
19	GPT-5	OpenAI	88.2	21	±4	High	Stable
20	Gemini 3 Flash Preview	Google	88.0	22	±4	High	Stable

置信度

Stable

Held

Fragile

Preliminary

High

297

Medium

Low

影响置信度的因素

排名置信度如何确定以及各指标的含义。