Last updated: 1h ago

Arena benchmark

LMSYS Chatbot Arena Elo Rating Leaderboard

Human preference rating from 6M+ crowdsourced blind head-to-head comparisons. Users chat with two anonymous models and pick the better response.

Why it matters: The most trusted 'vibes-based' benchmark — reflects real human preferences, not just academic metrics. Widely considered the most meaningful overall ranking.

Top Model

1,503

Claude Opus 4.6

Average Score

1,367

Across 124 models

Models Tested

124

Metric: Elo rating

Human Baseline

Score Range: 900–1600

Arena Elo Scores - Top 25 Models

Ranked by Arena Elo score

LMMarketCap.com

Model Rankings

All models with a reported Arena Elo score, ranked by highest Elo rating.

RankModelScorePerformance

Claude Opus 4.6 Anthropic

1,503

86%

1,503

Gemini 3.1 Pro Google

1,494

85%

1,494

Claude Opus 4.7 Anthropic

1,491

84%

1,491

Gemini 3 Pro Google

1,486

84%

1,486

GPT-5.4 OpenAI

1,485

84%

1,485

GPT-5.2 OpenAI

1,481

83%

1,481

GPT-5.2 Chat OpenAI

1,477

82%

1,477

GPT-5.1 OpenAI

1,475

82%

1,475

GPT-5.5 OpenAI

1,475

82%

1,475

#10

Gemini 3 Flash Google

1,474

82%

1,474

#11

GLM 5.1 Zhipu AI

1,471

82%

1,471

#12

Grok 4.1 Fast xAI

1,467

81%

1,467

#13

GPT-5 OpenAI

1,465

81%

1,465

#14

MiMo-V2.5-Pro Xiaomi

1,464

81%

1,464

#15

DeepSeek V4 Pro DeepSeek

1,463

80%

1,463

#16

Grok 4 xAI

1,462

80%

1,462

#16

Kimi K2.6 Moonshot AI

1,462

80%

1,462

#18

Claude Sonnet 4.6 Anthropic

1,460

80%

1,460

#19

Qwen3.6 Max Preview Alibaba

1,457

80%

1,457

#19

GLM 5 Zhipu AI

1,457

80%

1,457

#21

Grok 4.3 xAI

1,455

79%

1,455

#22

Claude Sonnet 4.5 Anthropic

1,452

79%

1,452

#23

Gemma 4 31B Google

1,451

79%

1,451

#24

Claude Opus 4.1 Anthropic

1,449

78%

1,449

#25

Qwen3.6 Plus Alibaba

1,448

78%

1,448

#26

MiMo-V2-Pro Xiaomi

1,447

78%

1,447

#27

Qwen3.5 397B A17B Alibaba

1,446

78%

1,446

#28

Gemini 2.5 Pro Google

1,444

78%

1,444

#29

GLM 4.7 Zhipu AI

1,443

78%

1,443

#30

Gemini 3.1 Flash Lite Preview Google

1,438

77%

1,438

#30

Gemma 4 26B A4B Google

1,438

77%

1,438

#32

DeepSeek V4 Flash DeepSeek

1,433

76%

1,433

#33

Claude Opus 4.5 Anthropic

1,430

76%

1,430

#34

GPT-5 Chat OpenAI

1,426

75%

1,426

#34

GLM 4.6 Zhipu AI

1,426

75%

1,426

#36

DeepSeek V3.2 DeepSeek

1,424

75%

1,424

#37

DeepSeek V3.2 Exp DeepSeek

1,423

75%

1,423

#37

MiMo-V2.5 Xiaomi

1,423

75%

1,423

#39

Grok 4 Fast xAI

1,421

74%

1,421

#40

Claude Opus 4 Anthropic

1,420

74%

1,420

#41

Qwen3.5-122B-A10B Alibaba

1,418

74%

1,418

#41

Hy3 preview Tencent

1,418

74%

1,418

#41

DeepSeek V3.1 DeepSeek

1,418

74%

1,418

#44

DeepSeek V3.1 Terminus DeepSeek

1,416

74%

1,416

#45

o3 OpenAI

1,415

74%

1,415

#45

Qwen3 VL 235B A22B Instruct Alibaba

1,415

74%

1,415

#47

GLM 4.5 Zhipu AI

1,411

73%

1,411

#48

MiniMax M2.7 MiniMax

1,407

72%

1,407

#49

Qwen3.5-27B Alibaba

1,406

72%

1,406

#50

Qwen3 Next 80B A3B Instruct Alibaba

1,402

72%

1,402

#51

Qwen3.5-Flash Alibaba

1,398

71%

1,398

#52

Qwen3.5-35B-A3B Alibaba

1,397

71%

1,397

#53

Qwen3 VL 235B A22B Thinking Alibaba

1,396

71%

1,396

#54

Gemini 2.5 Flash Google

1,395

71%

1,395

#54

MiniMax M2.5 MiniMax

1,395

71%

1,395

#56

Step 3.5 Flash StepFun

1,393

70%

1,393

#57

GPT-5 Mini OpenAI

1,390

70%

1,390

#58

Claude Sonnet 4 Anthropic

1,387

70%

1,387

#58

Claude 3.7 Sonnet (thinking)Anthropic

1,387

70%

1,387

#60

GPT-4.1 Mini OpenAI

1,382

69%

1,382

#61

o4-mini OpenAI

1,380

69%

1,380

#61

DeepSeek R1-0528 DeepSeek

1,380

69%

1,380

#61

Trinity Large Thinking arcee-ai

1,380

69%

1,380

#64

GLM 4.6V Zhipu AI

1,378

68%

1,378

#65

Trinity Large Preview arcee-ai

1,375

68%

1,375

#66

GLM 4.5 Air Zhipu AI

1,373

68%

1,373

#67

o3-mini OpenAI

1,371

67%

1,371

#68

DeepSeek R1 DeepSeek

1,369

67%

1,369

#68

Qwen3 Next 80B A3B Thinking Alibaba

1,369

67%

1,369

#70

GLM 4.7 Flash Zhipu AI

1,368

67%

1,368

#71

MiniMax M1 MiniMax

1,363

66%

1,363

#71

o3 Mini High OpenAI

1,363

66%

1,363

#73

Grok 3 Mini Beta xAI

1,357

65%

1,357

#74

Claude 3.7 Sonnet Anthropic

1,354

65%

1,354

#75

GLM 4.5V Zhipu AI

1,353

65%

1,353

#75

Gemini 2.0 Flash Lite Google

1,353

65%

1,353

#75

gpt-oss-120b OpenAI

1,353

65%

1,353

#78

Gemini 2.0 Flash Google

1,352

65%

1,352

#79

o1 OpenAI

1,350

64%

1,350

#80

Qwen3 8B Alibaba

1,347

64%

1,347

#80

Mercury 2 Inception

1,347

64%

1,347

#82

MiniMax M2 MiniMax

1,346

64%

1,346

#83

DeepSeek V3 (March 2025)DeepSeek

1,345

64%

1,345

#84

Llama 3.3 Nemotron Super 49B V1.5 NVIDIA

1,343

63%

1,343

#85

Grok 3 xAI

1,342

63%

1,342

#86

GPT-5 Nano OpenAI

1,337

62%

1,337

#86

Nova 2 Lite Amazon

1,337

62%

1,337

#88

o1 Preview OpenAI

1,334

62%

1,334

#89

Llama 4 Maverick Meta

1,325

61%

1,325

#90

GPT-4.1 Nano OpenAI

1,322

60%

1,322

#91

DeepSeek V3 DeepSeek

1,318

60%

1,318

#92

GPT-4o-mini (2024-07-18)OpenAI

1,317

60%

1,317

#92

gpt-oss-20b OpenAI

1,317

60%

1,317

#94

Mistral Large 2407 Mistral AI

1,313

59%

1,313

#95

Olmo 3 32B Think Allen AI

1,305

58%

1,305

#96

o1-mini OpenAI

1,304

58%

1,304

#97

GPT-4.1 OpenAI

1,300

57%

1,300

#98

GPT-4o OpenAI

1,286

55%

1,286

#99

Gemini 1.5 Pro Google

1,281

54%

1,281

#100

Mistral Large 2 Mistral AI

1,280

54%

1,280

#101

Llama 3 70B Instruct Meta

1,275

54%

1,275

#102

GPT-4 OpenAI

1,274

53%

1,274

#103

Claude 3.5 Sonnet Anthropic

1,271

53%

1,271

#104

Qwen 2.5 Coder 32B Alibaba

1,270

53%

1,270

#105

Grok 2 xAI

1,262

52%

1,262

#106

Qwen 2.5 72B Alibaba

1,261

52%

1,261

#106

Command R+Cohere

1,261

52%

1,261

#106

Command A Cohere

1,261

52%

1,261

#109

Claude 3 Haiku Anthropic

1,260

51%

1,260

#110

Phi-4 Microsoft

1,256

51%

1,256

#111

GPT-4 Turbo OpenAI

1,255

51%

1,255

#112

Command R (08-2024)Cohere

1,249

50%

1,249

#113

Llama 3.3 70B Meta

1,243

49%

1,243

#114

Claude Haiku 4.5 Anthropic

1,240

49%

1,240

#115

Claude 3 Opus Anthropic

1,232

47%

1,232

#116

Llama 3.1 405B Meta

1,229

47%

1,229

#117

GPT-4o mini OpenAI

1,222

46%

1,222

#117

Llama 3 8B Instruct Meta

1,222

46%

1,222

#119

Llama 3.1 8B Instruct Meta

1,211

44%

1,211

#120

Llama 3.1 70B Meta

1,198

43%

1,198

#121

Claude 3.5 Haiku Anthropic

1,178

40%

1,178

#122

Llama 3.2 3B Instruct Meta

1,166

38%

1,166

#123

Mixtral 8x22B Mistral AI

1,146

35%

1,146

#124

Llama 3.2 1B Instruct Meta

1,110

30%

1,110

About Arena Elo

Full Name: LMSYS Chatbot Arena Elo Rating
Category: Arena
Metric: Elo rating
Score Range: 900–1600
Human Baseline: Not established
Status: Active

Frequently Asked Questions

Arena Elo is a standardized evaluation that measures AI model performance on specific tasks. It provides comparable scores across different models, helping developers choose the right model for their needs.

Claude Opus 4.6 currently holds the top score on the Arena Elo benchmark. See our full rankings table above for the complete leaderboard with 124 models.

We update benchmark data from multiple sources including HuggingFace open-source model leaderboards and LMArena. Scores are refreshed regularly as new evaluations are published and new models are released.

No. While Arena Elo is an important indicator, real-world performance depends on many factors including pricing, latency, context window, and specific task requirements. We recommend using our composite score which weighs multiple benchmarks and practical factors.

Related Benchmarks

MMLUKnowledge MMLU-ProKnowledge GPQA DiamondReasoning MATH-500Math HumanEvalCoding SWE-bench VerifiedCoding AIME 2024Math GSM8KMath IFEvalInstruction

All Benchmarks|Arena Benchmarks|Compare Models|LLM Leaderboard