检测AI模型可能出现的性能下降。此追踪器监控排名变动(排行榜上的位置变化),并标记在24小时或7天内显著下降的模型。"衰退分"越高,意味着越多的警告信号。
存在风险的模型
27
下降中 (7天)
2
不稳定
1
持续下降
25
27 个模型显示性能衰退迹象,按风险评分排名。更高的风险评分表示更令人担忧的性能趋势。
| 衰退分 | 模型 | 质量 | 24小时排名 | 7天排名 | 严重程度 |
|---|---|---|---|---|---|
| 146 | GLM 5V TurboZhipu AI | 40.0 | -146 | +115 | high |
| 38 | Mistral NemoMistral AI | 39.9 | -11 | -11 | high |
| 8 | Trinity Large Thinkingarcee-ai | 62.7 | +1 | -4 | medium |
| 3 | Lyria 3 Pro PreviewGoogle | 40.0 | -1 | -1 | low |
| 3 | Lyria 3 Clip PreviewGoogle | 40.0 | -1 | -1 | low |
| 3 | KAT-Coder-Pro V2Kuaishou | 40.0 | -1 | -1 | low |
| 3 | Reka Edgerekaai | 40.0 | -1 | -1 | low |
| 3 | Mistral Small 4Mistral AI | 40.0 | -1 | -1 | low |
| 3 | Nemotron 3 Super (free)NVIDIA | 40.0 | -1 | -1 | low |
| 3 | Nemotron 3 SuperNVIDIA | 40.0 | -1 | -1 | low |
| 3 | Seed-2.0-LiteByteDance | 40.0 | -1 | -1 | low |
| 3 | Seed-2.0-MiniByteDance | 40.0 | -1 | -1 | low |
| 3 | LFM2-24B-A2BLiquid AI | 40.0 | -1 | -1 | low |
| 3 | Aion-2.0aion-labs | 40.0 | -1 | -1 | low |
| 3 | Qwen3.5 Plus 2026-02-15Alibaba | 40.0 | -1 | -1 | low |
| 3 | Qwen3 Coder NextAlibaba | 40.0 | -1 | -1 | low |
| 3 | Solar Pro 3Upstage | 40.0 | -1 | -1 | low |
| 3 | Palmyra X5Writer | 40.0 | -1 | -1 | low |
| 3 | LFM2.5-1.2B-Thinking (free)Liquid AI | 40.0 | -1 | -1 | low |
| 3 | LFM2.5-1.2B-Instruct (free)Liquid AI | 40.0 | -1 | -1 | low |
| 3 | GPT AudioOpenAI | 40.0 | -1 | -1 | low |
| 3 | GPT Audio MiniOpenAI | 40.0 | -1 | -1 | low |
| 3 | Seed 1.6 FlashByteDance | 40.0 | -1 | -1 | low |
| 3 | Seed 1.6ByteDance | 40.0 | -1 | -1 | low |
| 3 | Nemotron 3 Nano 30B A3B (free)NVIDIA | 40.0 | -1 | -1 | low |
| 3 | Nemotron 3 Nano 30B A3BNVIDIA | 40.0 | -1 | -1 | low |
| 3 | Coder Largearcee-ai | 39.3 | -1 | -1 | low |
272 个模型无下降且排名状态稳定。这些模型表现一致。
| # | 模型 | 评分 | 24h | 7d | 状态 |
|---|---|---|---|---|---|
| 1 | Claude Fable 5Anthropic | 96.6 | 0 | 0 | stable |
| 2 | Claude Opus 4.7 (Fast)Anthropic | 94.7 | 0 | 0 | stable |
| 3 | Claude Opus 4.7Anthropic | 94.7 | 0 | 0 | stable |
| 4 | Claude Opus 4.8 (Fast)Anthropic | 94.2 | 0 | 0 | stable |
| 5 | Claude Opus 4.8Anthropic | 94.2 | 0 | 0 | stable |
| 6 | GPT-5.5OpenAI | 92.2 | 0 | 0 | stable |
| 7 | Gemini 3.1 Pro Preview Custom ToolsGoogle | 91.7 | 0 | 0 | stable |
| 8 | Gemini 3.1 Pro PreviewGoogle | 91.7 | 0 | 0 | stable |
| 9 | GPT-5.4 ProOpenAI | 91.5 | 0 | 0 | stable |
| 10 | GPT-5.4OpenAI | 91.5 | 0 | 0 | stable |
| 11 | GPT-5.5 ProOpenAI | 90.3 | 0 | 0 | stable |
| 12 | GPT-5.2-CodexOpenAI | 90.1 | 0 | 0 | stable |
| 13 | GPT-5.2 ProOpenAI | 90.1 | 0 | 0 | stable |
| 14 | GPT-5.2OpenAI | 90.1 | 0 | 0 | stable |
| 15 | Claude Opus 4.6 (Fast)Anthropic | 90.0 | 0 | 0 | stable |
| 16 | Claude Opus 4.6Anthropic | 90.0 | 0 | 0 | stable |
| 17 | Grok 4.20xAI | 88.3 | 0 | 0 | stable |
| 18 | GPT-5.3-CodexOpenAI | 88.2 | 0 | 0 | stable |
| 19 | GPT-5 ProOpenAI | 88.2 | 0 | 0 | stable |
| 20 | GPT-5 CodexOpenAI | 88.2 | 0 | 0 | stable |
显示 272 个稳定模型中的前 20 个。
我们的性能衰退检测系统使用多种信号来识别可能正在下降的模型。
7天排名变化超过-2位的模型。一周内持续下降超过两个名次,表明该模型可能正在被竞争对手超越或出现性能问题。
被评分系统标记为"脆弱"的模型。这些模型的性能指标不一致或评分处于边界,评估数据的微小变化可能导致显著波动。
在24小时和7天两个时间维度上均下降的模型。当模型在短期和中期窗口都在失去排名时,表明这是持续的下降趋势而非暂时波动。
性能衰退风险评分综合了多个信号:7天排名下降权重2倍,24小时排名下降权重1倍,脆弱状态额外加5分。更高的评分表示有更大的性能衰退风险。
The tracker uses a multi-signal approach: it monitors 7-day rank decline (weighted 2x), 24-hour rank drops (weighted 1x), and fragile state classification (+5 points). Models are scored on a degradation risk scale where higher values indicate more warning signs of performance decline.
A fragile state indicates that a model has inconsistent performance metrics or borderline scores that could shift significantly with small changes in evaluation data. Fragile models are at higher risk of further ranking drops and warrant closer monitoring.
Yes, models can recover. Degradation may be temporary due to API issues, benchmark fluctuations, or scoring recalibrations. Models that show sustained decline over multiple weeks are more concerning than those with short-term dips. The tracker monitors both 24-hour and 7-day windows to help distinguish temporary noise from real trends.