Model Stability Report

Which AI models are the most consistent over time? This report analyzes rank changes, state classifications, and sparkline volatility across 300 tracked models to produce a stability score from 0 to 100.

Rock Solid

238

Consistent

Variable

Volatile

Stability Classification Distribution

LMMarketCap.com

Provider Stability Rankings (Avg Score)

LMMarketCap.com

Most Stable Models

Top 20 models with the highest stability scores. These models maintain consistent rankings with minimal volatility.

#	Model	Provider	Score	Stability	24h	7d	State	Rank Spread
1	Claude Opus 4.6 (Fast)Anthropic	Anthropic	90.4	100	0	0	stable	±2
2	Grok 4.20xAI	xAI	88.8	100	0	0	stable	±2
3	Grok 4.20 Multi-AgentxAI	xAI	87.9	100	0	0	stable	±2
4	Claude Sonnet 4.6Anthropic	Anthropic	85.2	100	0	0	stable	±2
5	Gemma 4 31B (free)Google	Google	80.5	100	0	0	stable	±2
6	Claude Opus 4.7Anthropic	Anthropic	79.3	100	0	0	stable	±2
7	GPT-5.4 NanoOpenAI	OpenAI	79.3	100	0	0	stable	±2
8	GPT-5.4 MiniOpenAI	OpenAI	79.3	100	0	0	stable	±2
9	Grok 4.1 FastxAI	xAI	78.0	100	0	-1	stable	±2
10	Grok 4.3xAI	xAI	76.4	100	0	-1	stable	±2
11	GLM 5.1Zhipu AI	Zhipu AI	76.1	100	0	0	stable	±2
12	Kimi K2.6Moonshot AI	Moonshot AI	75.9	100	0	+2	stable	±2
13	DeepSeek V4 ProDeepSeek	DeepSeek	75.7	100	0	-2	stable	±2
14	Qwen3.6 Max PreviewAlibaba	Alibaba	74.5	100	0	0	stable	±2
15	Gemma 4 26B A4B (free)Google	Google	73.0	100	0	0	stable	±2
16	Gemma 4 26B A4B Google	Google	73.0	100	0	0	stable	±2
17	Grok 4 FastxAI	xAI	72.5	100	0	0	stable	±2
18	DeepSeek V4 FlashDeepSeek	DeepSeek	72.1	100	0	+1	stable	±2
19	Trinity Large Previewarcee-ai	arcee-ai	63.6	100	-1	-1	stable	±2
20	gpt-oss-120bOpenAI	OpenAI	40.5	100	-1	0	stable	±2

Most Volatile Models

Bottom 20 models with the lowest stability scores. These models show significant ranking fluctuations or inconsistent states.

#	Model	Provider	Score	Stability	24h	7d	State	Rank Spread
1	Hy3 previewTencent	Tencent	69.0	54	+94	+234	preliminary	±2
2	Ling-2.6-1Tinclusionai	inclusionai	40.0	74	-1	+140	preliminary	±2
3	Mistral Medium 3.5Mistral AI	Mistral AI	40.0	74	-1	+155	preliminary	±2
4	GPT Chat LatestOpenAI	OpenAI	40.0	74	-1	+157	preliminary	±2
5	CoBuddy (free)Baidu	Baidu	40.0	74	-1	+158	preliminary	±2
6	Ring-2.6-1T (free)inclusionai	inclusionai	40.0	74	-1	+159	preliminary	±2
7	Phi 4 Mini InstructMicrosoft	Microsoft	52.7	74	-1	+175	preliminary	±2
8	Trinity Large Thinkingarcee-ai	arcee-ai	65.2	76	-1	-3	stable	±2
9	Llama 3.3 70B InstructMeta	Meta	66.8	79	-1	-2	stable	±2
10	Nova Micro 1.0Amazon	Amazon	40.0	81	0	+3	stable	±2
11	Aion-1.0-Miniaion-labs	aion-labs	40.0	81	0	+3	stable	±2
12	Aion-1.0aion-labs	aion-labs	40.0	81	0	+3	stable	±2
13	Llama Guard 3 8BMeta	Meta	40.0	81	0	+3	stable	±2
14	Qwen3.5 Plus 2026-02-15Alibaba	Alibaba	40.0	81	0	-3	stable	±2
15	Seed-2.0-MiniByteDance	ByteDance	40.0	81	0	-3	stable	±2
16	Seed-2.0-LiteByteDance	ByteDance	40.0	81	0	-3	stable	±2
17	MiMo-V2-OmniXiaomi	Xiaomi	40.0	81	0	-3	stable	±2
18	GLM 5V TurboZhipu AI	Zhipu AI	40.0	81	0	-3	stable	±2
19	Mistral Small 4Mistral AI	Mistral AI	40.0	82	0	-3	stable	±2
20	Command R (08-2024)Cohere	Cohere	48.7	82	-1	-1	stable	±2

Stability by Provider

Aggregated stability metrics per provider. Providers are ranked by their average stability score across all models.

Provider	Models	Avg Stability	Most Stable Model	Most Volatile Model
essentialai	1	97.4	Rnj 1 Instruct(97)	Rnj 1 Instruct(97)
deepcogito	1	97.0	Cogito v2.1 671B(97)	Cogito v2.1 671B(97)
AI21 Labs	1	96.1	Jamba Large 1.7(96)	Jamba Large 1.7(96)
Kuaishou	1	95.3	KAT-Coder-Pro V2(95)	KAT-Coder-Pro V2(95)
~anthropic	3	95.3	Claude Opus Latest(100)	Anthropic Claude Haiku Latest(93)
Writer	1	95.1	Palmyra X5(95)	Palmyra X5(95)
xAI	11	94.7	Grok 4.20(100)	Grok 3 Mini Beta(82)
NVIDIA	9	93.3	Nemotron Nano 9B V2 (free)(100)	Nemotron 3 Super (free)(83)
Upstage	1	93.2	Solar Pro 3(93)	Solar Pro 3(93)
poolside	2	93.0	Laguna XS.2 (free)(93)	Laguna XS.2 (free)(93)
~openai	2	93.0	OpenAI GPT Mini Latest(93)	OpenAI GPT Mini Latest(93)
~google	2	93.0	Google Gemini Pro Latest(93)	Google Gemini Pro Latest(93)
~moonshotai	1	93.0	MoonshotAI Kimi Latest(93)	MoonshotAI Kimi Latest(93)
Inception	1	92.4	Mercury 2(92)	Mercury 2(92)
Google	23	92.3	Gemma 4 31B (free)(100)	Gemini 2.0 Flash Lite(85)
Moonshot AI	5	92.1	Kimi K2.6(100)	Kimi K2 0711(82)
Anthropic	14	92.0	Claude Opus 4.6 (Fast)(100)	Claude 3.5 Haiku(82)
DeepSeek	12	91.9	DeepSeek V4 Pro(100)	R1 Distill Llama 70B(86)
MiniMax	8	91.3	MiniMax-01(100)	MiniMax M2.5 (free)(90)
Alibaba	49	91.3	Qwen3.6 Max Preview(100)	Qwen3.5 Plus 2026-02-15(81)
IBM	2	90.4	Granite 4.1 8B(96)	Granite 4.0 Micro(85)
Amazon	4	90.3	Nova 2 Lite(96)	Nova Micro 1.0(81)
Baidu	7	90.2	Qianfan-OCR-Fast (free)(100)	CoBuddy (free)(74)
arcee-ai	6	90.0	Trinity Large Preview(100)	Trinity Large Thinking(76)
OpenAI	57	90.0	GPT-5.4 Nano(100)	GPT Chat Latest(74)
Xiaomi	5	89.7	MiMo-V2-Pro(98)	MiMo-V2-Omni(81)
Zhipu AI	12	89.3	GLM 5.1(100)	GLM 5V Turbo(81)
Liquid AI	3	88.8	LFM2.5-1.2B-Thinking (free)(93)	LFM2.5-1.2B-Instruct (free)(86)
rekaai	2	88.8	Reka Edge(92)	Reka Flash 3(86)
Mistral AI	18	88.3	Mistral Medium 3.1(100)	Mistral Medium 3.5(74)
Perplexity	5	87.9	Sonar Reasoning Pro(92)	Sonar(86)
Allen AI	1	87.7	Olmo 3 32B Think(88)	Olmo 3 32B Think(88)
aion-labs	3	86.0	Aion-2.0(96)	Aion-1.0(81)
StepFun	1	85.4	Step 3.5 Flash(85)	Step 3.5 Flash(85)
Cohere	3	84.7	Command R+ (08-2024)(90)	Command A(82)
Meta	9	84.5	Llama Guard 4 12B(96)	Llama 3.3 70B Instruct(79)
ByteDance	5	84.4	UI-TARS 7B (87)	Seed-2.0-Lite(81)
inclusionai	3	82.7	Ling-2.6-flash(100)	Ring-2.6-1T (free)(74)
Cursor	2	82.0	Composer 2(82)	Composer 2(82)
Microsoft	2	78.0	Phi 4(82)	Phi 4 Mini Instruct(74)
Tencent	2	75.8	Hunyuan A13B Instruct(98)	Hy3 preview(54)

Stability Distribution

How stability scores are distributed across all 300 tracked models.

0–10

10–20

20–30

30–40

40–50

50–60

60–70

70–80

80–90

90–100

194

What Makes a Model Stable?

Our stability scoring system uses three key signals to measure how consistently a model performs over time.

Rank Consistency

The most direct measure of stability. Models lose up to 25 points for large 24-hour rank changes (5 points per rank position moved) and up to 21 points for 7-day changes (3 points per position). Models that hold their rank tightly score higher.

State Classification

Each model has a state reflecting its overall reliability. Models in a "stable" state receive a 10-point bonus, while "fragile" models are penalized 15 points. This captures systemic reliability beyond simple rank movement.

Sparkline Volatility

The 14-day sparkline data reveals hidden volatility. We compute the standard deviation of the sparkline and subtract up to 20 points. Even models that end where they started can be penalized if they oscillated wildly along the way.

All Trackers

Coding, image, and video model trackers

Degradation Tracker

Detect models with declining performance

Coding Tracker

Daily coding model performance and rankings

Frequently Asked Questions

The stability score starts at 100 and is reduced based on three factors: 24-hour rank changes (up to -25 points, at 5 per position moved), 7-day rank changes (up to -21 points, at 3 per position), and sparkline volatility measured by standard deviation (up to -20 points). Models in a "stable" state get a +10 bonus, while "fragile" models lose 15 points.

Models are classified into four tiers based on their stability score: "Rock Solid" (85-100) means extremely consistent performance with minimal fluctuation. "Consistent" (70-84) means generally reliable with minor variations. "Variable" (50-69) shows noticeable ranking fluctuations. "Volatile" (below 50) indicates significant instability and unpredictable performance.

Stability indicates how predictably a model will perform over time. A highly rated but volatile model may deliver inconsistent results, which is problematic for production applications requiring reliable output quality. Stable models provide more predictable performance, making them safer choices for mission-critical workloads even if they do not always hold the top rank.

Model Stability Report

Rock Solid

238

Consistent

Variable

Volatile

Stability Classification Distribution

LMMarketCap.com

Provider Stability Rankings (Avg Score)

LMMarketCap.com

Most Stable Models

Top 20 models with the highest stability scores. These models maintain consistent rankings with minimal volatility.

#	Model	Provider	Score	Stability	24h	7d	State	Rank Spread
1	Claude Opus 4.6 (Fast)Anthropic	Anthropic	90.4	100	0	0	stable	±2
2	Grok 4.20xAI	xAI	88.8	100	0	0	stable	±2
3	Grok 4.20 Multi-AgentxAI	xAI	87.9	100	0	0	stable	±2
4	Claude Sonnet 4.6Anthropic	Anthropic	85.2	100	0	0	stable	±2
5	Gemma 4 31B (free)Google	Google	80.5	100	0	0	stable	±2
6	Claude Opus 4.7Anthropic	Anthropic	79.3	100	0	0	stable	±2
7	GPT-5.4 NanoOpenAI	OpenAI	79.3	100	0	0	stable	±2
8	GPT-5.4 MiniOpenAI	OpenAI	79.3	100	0	0	stable	±2
9	Grok 4.1 FastxAI	xAI	78.0	100	0	-1	stable	±2
10	Grok 4.3xAI	xAI	76.4	100	0	-1	stable	±2
11	GLM 5.1Zhipu AI	Zhipu AI	76.1	100	0	0	stable	±2
12	Kimi K2.6Moonshot AI	Moonshot AI	75.9	100	0	+2	stable	±2
13	DeepSeek V4 ProDeepSeek	DeepSeek	75.7	100	0	-2	stable	±2
14	Qwen3.6 Max PreviewAlibaba	Alibaba	74.5	100	0	0	stable	±2
15	Gemma 4 26B A4B (free)Google	Google	73.0	100	0	0	stable	±2
16	Gemma 4 26B A4B Google	Google	73.0	100	0	0	stable	±2
17	Grok 4 FastxAI	xAI	72.5	100	0	0	stable	±2
18	DeepSeek V4 FlashDeepSeek	DeepSeek	72.1	100	0	+1	stable	±2
19	Trinity Large Previewarcee-ai	arcee-ai	63.6	100	-1	-1	stable	±2
20	gpt-oss-120bOpenAI	OpenAI	40.5	100	-1	0	stable	±2

Most Volatile Models

Bottom 20 models with the lowest stability scores. These models show significant ranking fluctuations or inconsistent states.

#	Model	Provider	Score	Stability	24h	7d	State	Rank Spread
1	Hy3 previewTencent	Tencent	69.0	54	+94	+234	preliminary	±2
2	Ling-2.6-1Tinclusionai	inclusionai	40.0	74	-1	+140	preliminary	±2
3	Mistral Medium 3.5Mistral AI	Mistral AI	40.0	74	-1	+155	preliminary	±2
4	GPT Chat LatestOpenAI	OpenAI	40.0	74	-1	+157	preliminary	±2
5	CoBuddy (free)Baidu	Baidu	40.0	74	-1	+158	preliminary	±2
6	Ring-2.6-1T (free)inclusionai	inclusionai	40.0	74	-1	+159	preliminary	±2
7	Phi 4 Mini InstructMicrosoft	Microsoft	52.7	74	-1	+175	preliminary	±2
8	Trinity Large Thinkingarcee-ai	arcee-ai	65.2	76	-1	-3	stable	±2
9	Llama 3.3 70B InstructMeta	Meta	66.8	79	-1	-2	stable	±2
10	Nova Micro 1.0Amazon	Amazon	40.0	81	0	+3	stable	±2
11	Aion-1.0-Miniaion-labs	aion-labs	40.0	81	0	+3	stable	±2
12	Aion-1.0aion-labs	aion-labs	40.0	81	0	+3	stable	±2
13	Llama Guard 3 8BMeta	Meta	40.0	81	0	+3	stable	±2
14	Qwen3.5 Plus 2026-02-15Alibaba	Alibaba	40.0	81	0	-3	stable	±2
15	Seed-2.0-MiniByteDance	ByteDance	40.0	81	0	-3	stable	±2
16	Seed-2.0-LiteByteDance	ByteDance	40.0	81	0	-3	stable	±2
17	MiMo-V2-OmniXiaomi	Xiaomi	40.0	81	0	-3	stable	±2
18	GLM 5V TurboZhipu AI	Zhipu AI	40.0	81	0	-3	stable	±2
19	Mistral Small 4Mistral AI	Mistral AI	40.0	82	0	-3	stable	±2
20	Command R (08-2024)Cohere	Cohere	48.7	82	-1	-1	stable	±2

Stability by Provider

Aggregated stability metrics per provider. Providers are ranked by their average stability score across all models.

Provider	Models	Avg Stability	Most Stable Model	Most Volatile Model
essentialai	1	97.4	Rnj 1 Instruct(97)	Rnj 1 Instruct(97)
deepcogito	1	97.0	Cogito v2.1 671B(97)	Cogito v2.1 671B(97)
AI21 Labs	1	96.1	Jamba Large 1.7(96)	Jamba Large 1.7(96)
Kuaishou	1	95.3	KAT-Coder-Pro V2(95)	KAT-Coder-Pro V2(95)
~anthropic	3	95.3	Claude Opus Latest(100)	Anthropic Claude Haiku Latest(93)
Writer	1	95.1	Palmyra X5(95)	Palmyra X5(95)
xAI	11	94.7	Grok 4.20(100)	Grok 3 Mini Beta(82)
NVIDIA	9	93.3	Nemotron Nano 9B V2 (free)(100)	Nemotron 3 Super (free)(83)
Upstage	1	93.2	Solar Pro 3(93)	Solar Pro 3(93)
poolside	2	93.0	Laguna XS.2 (free)(93)	Laguna XS.2 (free)(93)
~openai	2	93.0	OpenAI GPT Mini Latest(93)	OpenAI GPT Mini Latest(93)
~google	2	93.0	Google Gemini Pro Latest(93)	Google Gemini Pro Latest(93)
~moonshotai	1	93.0	MoonshotAI Kimi Latest(93)	MoonshotAI Kimi Latest(93)
Inception	1	92.4	Mercury 2(92)	Mercury 2(92)
Google	23	92.3	Gemma 4 31B (free)(100)	Gemini 2.0 Flash Lite(85)
Moonshot AI	5	92.1	Kimi K2.6(100)	Kimi K2 0711(82)
Anthropic	14	92.0	Claude Opus 4.6 (Fast)(100)	Claude 3.5 Haiku(82)
DeepSeek	12	91.9	DeepSeek V4 Pro(100)	R1 Distill Llama 70B(86)
MiniMax	8	91.3	MiniMax-01(100)	MiniMax M2.5 (free)(90)
Alibaba	49	91.3	Qwen3.6 Max Preview(100)	Qwen3.5 Plus 2026-02-15(81)
IBM	2	90.4	Granite 4.1 8B(96)	Granite 4.0 Micro(85)
Amazon	4	90.3	Nova 2 Lite(96)	Nova Micro 1.0(81)
Baidu	7	90.2	Qianfan-OCR-Fast (free)(100)	CoBuddy (free)(74)
arcee-ai	6	90.0	Trinity Large Preview(100)	Trinity Large Thinking(76)
OpenAI	57	90.0	GPT-5.4 Nano(100)	GPT Chat Latest(74)
Xiaomi	5	89.7	MiMo-V2-Pro(98)	MiMo-V2-Omni(81)
Zhipu AI	12	89.3	GLM 5.1(100)	GLM 5V Turbo(81)
Liquid AI	3	88.8	LFM2.5-1.2B-Thinking (free)(93)	LFM2.5-1.2B-Instruct (free)(86)
rekaai	2	88.8	Reka Edge(92)	Reka Flash 3(86)
Mistral AI	18	88.3	Mistral Medium 3.1(100)	Mistral Medium 3.5(74)
Perplexity	5	87.9	Sonar Reasoning Pro(92)	Sonar(86)
Allen AI	1	87.7	Olmo 3 32B Think(88)	Olmo 3 32B Think(88)
aion-labs	3	86.0	Aion-2.0(96)	Aion-1.0(81)
StepFun	1	85.4	Step 3.5 Flash(85)	Step 3.5 Flash(85)
Cohere	3	84.7	Command R+ (08-2024)(90)	Command A(82)
Meta	9	84.5	Llama Guard 4 12B(96)	Llama 3.3 70B Instruct(79)
ByteDance	5	84.4	UI-TARS 7B (87)	Seed-2.0-Lite(81)
inclusionai	3	82.7	Ling-2.6-flash(100)	Ring-2.6-1T (free)(74)
Cursor	2	82.0	Composer 2(82)	Composer 2(82)
Microsoft	2	78.0	Phi 4(82)	Phi 4 Mini Instruct(74)
Tencent	2	75.8	Hunyuan A13B Instruct(98)	Hy3 preview(54)

Stability Distribution

How stability scores are distributed across all 300 tracked models.

0–10

10–20

20–30

30–40

40–50

50–60

60–70

70–80

80–90

90–100

194

What Makes a Model Stable?

Our stability scoring system uses three key signals to measure how consistently a model performs over time.

Rank Consistency

State Classification

Sparkline Volatility

All Trackers

Coding, image, and video model trackers

Degradation Tracker

Detect models with declining performance

Coding Tracker

Daily coding model performance and rankings

Frequently Asked Questions

Model Stability Report

Stability Classification Distribution

Provider Stability Rankings (Avg Score)

Most Stable Models

Most Volatile Models

Stability by Provider

Stability Distribution

What Makes a Model Stable?

Rank Consistency

State Classification

Sparkline Volatility

Related

Model Stability Report

Stability Classification Distribution

Provider Stability Rankings (Avg Score)

Most Stable Models

Most Volatile Models

Stability by Provider

Stability Distribution

What Makes a Model Stable?

Rank Consistency

State Classification

Sparkline Volatility

Related