历史表现 - 编程

顶级编程AI模型的90天评分趋势。追踪综合评分变化、模型入榜日期以及影响排名的关键事件。

评分趋势

排名	模型	提供商	评分	30d	60d	90d
#1	Claude Fable 5	Anthropic	96.6	0.0	0.0	0.0
#2	Claude Opus 4.7 (Fast)	Anthropic	94.7	0.0	0.0	0.0
#3	Claude Opus 4.7	Anthropic	94.7	+13.2	+13.2	+13.2
#4	Claude Opus 4.8 (Fast)	Anthropic	94.2	+11.2	+11.2	+11.2
#5	Claude Opus 4.8	Anthropic	94.2	+11.2	+11.2	+11.2
#6	GPT-5.5	OpenAI	92.2	+3.2	+3.2	+3.2
#7	Gemini 3.1 Pro Preview Custom Tools	Google	91.7	+25.6	+25.6	+25.6
#8	Gemini 3.1 Pro Preview	Google	91.7	+26.5	+26.5	+26.5
#9	GPT-5.4 Pro	OpenAI	91.5	+17.4	+17.4	+17.4
#10	GPT-5.4	OpenAI	91.5	+12.1	+12.1	+12.1

模型时间线

进入日期	模型	提供商	进入排名	当前排名
2025-11-18	Gemini 3.1 Pro Preview Custom Tools	Google	#9	#7
2025-11-15	GPT-5.5	OpenAI	#14	#6
2025-11-02	Claude Opus 4.7	Anthropic	#6	#3
2025-10-05	Claude Fable 5	Anthropic	#4	#1
2025-09-17	Claude Opus 4.8 (Fast)	Anthropic	#7	#4
2025-09-15	GPT-5.4	OpenAI	#18	#10
2025-08-25	Gemini 3.1 Pro Preview	Google	#11	#8
2025-08-18	GPT-5.4 Pro	OpenAI	#9	#9
2025-07-21	Claude Opus 4.8	Anthropic	#8	#5
2025-06-02	Claude Opus 4.7 (Fast)	Anthropic	#5	#2

关键事件

2026-02-28versionClaude Opus 4.6 released with expanded context

2026-02-15pricingOpenAI reduced GPT-5.2 pricing by 20%

2026-02-01versionGemini 3 Pro launched with multimodal improvements

2026-01-20versionDeepSeek V3.1 update with enhanced reasoning

2026-01-10pricingAnthropic introduced new Claude Sonnet tier pricing

2025-12-15versionQwen 3.5 397B released by Alibaba Cloud

2025-12-01pricingGoogle adjusted Gemini API pricing structure

2025-11-15versionGrok 4.1 launched with code generation focus

相关

编程追踪器全部追踪器排行榜

Frequently Asked Questions

We track composite scores for the top coding AI models over a 90-day rolling window. Scores combine coding benchmarks like SWE-bench and HumanEval, pricing, context window, and capability data that refreshes hourly.

These columns show how much each model's composite score has changed over the last 30, 60, or 90 days. A positive change indicates improving performance or rankings, while a negative change suggests the model is falling behind newer competitors.

Check the Score Trends table above to see which models show the largest positive 30-day change. New model releases and major updates often cause significant score improvements.

Score data is refreshed hourly. The historical trend lines and change percentages are recalculated with each update to reflect the latest benchmark results, pricing changes, and capability additions.