顶级编程AI模型的90天评分趋势。追踪综合评分变化、模型入榜日期以及影响排名的关键事件。
| 排名 | 模型 | 提供商 | 评分 | 趋势 | 30d | 60d | 90d |
|---|---|---|---|---|---|---|---|
| #1 | GPT-5.4 Pro | OpenAI | 94.0 | +11.1 | +23.4 | +35.0 | |
| #2 | GPT-5.4 | OpenAI | 94.0 | +1.8 | +2.6 | +2.1 | |
| #3 | GPT-5.4 Mini | OpenAI | 93.3 | +10.2 | +19.6 | +28.7 | |
| #4 | GPT-5.2 Pro | OpenAI | 92.7 | +11.3 | +22.7 | +34.9 | |
| #5 | GPT-5.2 | OpenAI | 92.7 | -0.8 | -0.4 | +0.2 | |
| #6 | Claude Opus 4.6 | Anthropic | 92.1 | +12.6 | +24.8 | +37.4 | |
| #7 | GPT-5 Pro | OpenAI | 91.9 | +1.9 | +3.0 | +2.8 | |
| #8 | o3 Deep Research | OpenAI | 91.5 | +2.8 | +5.6 | +9.4 | |
| #9 | Claude Opus 4.5 | Anthropic | 90.4 | +12.1 | +25.1 | +38.0 | |
| #10 | GPT-5 | OpenAI | 90.0 | +11.4 | +22.8 | +33.8 |
| 进入日期 | 模型 | 提供商 | 进入排名 | 当前排名 |
|---|---|---|---|---|
| 2025-11-07 | GPT-5.2 Pro | OpenAI | #4 | #4 |
| 2025-10-31 | GPT-5 Pro | OpenAI | #10 | #7 |
| 2025-10-02 | Claude Opus 4.6 | Anthropic | #9 | #6 |
| 2025-09-23 | GPT-5.4 Pro | OpenAI | #1 | #1 |
| 2025-08-11 | Claude Opus 4.5 | Anthropic | #12 | #9 |
| 2025-06-29 | GPT-5.4 | OpenAI | #10 | #2 |
| 2025-06-23 | o3 Deep Research | OpenAI | #12 | #8 |
| 2025-06-22 | GPT-5.2 | OpenAI | #13 | #5 |
| 2025-06-10 | GPT-5 | OpenAI | #19 | #10 |
| 2025-06-05 | GPT-5.4 Mini | OpenAI | #9 | #3 |
We track composite scores for the top coding AI models over a 90-day rolling window. Scores combine coding benchmarks like SWE-bench and HumanEval, pricing, context window, and capability data that refreshes hourly.
These columns show how much each model's composite score has changed over the last 30, 60, or 90 days. A positive change indicates improving performance or rankings, while a negative change suggests the model is falling behind newer competitors.
Check the Score Trends table above to see which models show the largest positive 30-day change. New model releases and major updates often cause significant score improvements.
Score data is refreshed hourly. The historical trend lines and change percentages are recalculated with each update to reflect the latest benchmark results, pricing changes, and capability additions.