Compare open-source and proprietary models across performance, pricing, capabilities, and stability. Tracking 300 models to help you decide which approach best fits your needs.
| Metric | Open Source | Proprietary |
|---|---|---|
| Model Count | 145 | 155 |
| Avg Score | 53.5 | 62.0 |
| Median Score | 44.1 | 65.3 |
| Best Score | 86.2DeepSeek V4 Pro | 96.6Claude Fable 5 |
| Avg Cost ($/1M) | $0.679 | $12.12 |
| Free Models | 26 | 3 |
| Avg Context Window | 324K | 512K |
| Stable Models % | 99.3% | 98.7% |
| Fragile Models % | 0.7% | 0.0% |
| # | Model | Score |
|---|---|---|
| 1 | DeepSeek V4 ProDeepSeek | 86 |
| 2 | DeepSeek V3.2DeepSeek | 81 |
| 3 | Gemma 4 31B (free)Google | 80 |
| 4 | Gemma 4 31BGoogle | 80 |
| 5 | Qwen3.5 397B A17BAlibaba | 79 |
| 6 | R1 0528DeepSeek | 79 |
| 7 | GLM 5.2Zhipu AI | 78 |
| 8 | MiniMax M2.5MiniMax | 78 |
| 9 | GLM 5Zhipu AI | 78 |
| 10 | Qwen3.5-122B-A10BAlibaba | 77 |
| 11 | DeepSeek V4 FlashDeepSeek | 77 |
| 12 | Gemma 2 27BGoogle | 77 |
| 13 | Qwen3.5-27BAlibaba | 77 |
| 14 | GLM 5.1Zhipu AI | 76 |
| 15 | MiMo-V2.5-ProXiaomi | 76 |
| 16 | Qwen3.5-35B-A3BAlibaba | 76 |
| 17 | Kimi K2.6Moonshot AI | 75 |
| 18 | GLM 4.5Zhipu AI | 75 |
| 19 | MiniMax M3MiniMax | 74 |
| 20 | R1DeepSeek | 74 |
| # | Model | Score |
|---|---|---|
| 1 | Claude Fable 5Anthropic | 97 |
| 2 | Claude Opus 4.7 (Fast)Anthropic | 95 |
| 3 | Claude Opus 4.7Anthropic | 95 |
| 4 | Claude Opus 4.8 (Fast)Anthropic | 94 |
| 5 | Claude Opus 4.8Anthropic | 94 |
| 6 | GPT-5.5OpenAI | 92 |
| 7 | Gemini 3.1 Pro Preview Custom ToolsGoogle | 92 |
| 8 | Gemini 3.1 Pro PreviewGoogle | 92 |
| 9 | GPT-5.4 ProOpenAI | 92 |
| 10 | GPT-5.4OpenAI | 92 |
| 11 | GPT-5.5 ProOpenAI | 90 |
| 12 | GPT-5.2-CodexOpenAI | 90 |
| 13 | GPT-5.2 ProOpenAI | 90 |
| 14 | GPT-5.2OpenAI | 90 |
| 15 | Claude Opus 4.6 (Fast)Anthropic | 90 |
| 16 | Claude Opus 4.6Anthropic | 90 |
| 17 | Grok 4.20xAI | 88 |
| 18 | GPT-5.3-CodexOpenAI | 88 |
| 19 | GPT-5 ProOpenAI | 88 |
| 20 | GPT-5 CodexOpenAI | 88 |
| Capability | Open Source | Proprietary |
|---|---|---|
| Vision | 47 (32.4%) | 107 (69.0%) |
| Function Calling | 120 (82.8%) | 134 (86.5%) |
| Streaming | 145 (100.0%) | 155 (100.0%) |
| JSON Mode | 115 (79.3%) | 131 (84.5%) |
| Reasoning | 91 (62.8%) | 100 (64.5%) |
| Web Search | 0 (0.0%) | 83 (53.5%) |
| Image Output | 0 (0.0%) | 0 (0.0%) |
Open Source leads in free model availability, lower average pricing. With 26 free models, open-source offers the most accessible entry point for experimentation and prototyping.
Proprietary leads in average score, median score, model count, context window size, top model performance, capability coverage. The top proprietary model (Claude Fable 5) achieves a score of 97, setting the current performance ceiling.
Across 300 tracked models (145 open-source, 155 proprietary), the landscape continues to evolve rapidly. Open-source models excel for self-hosting, fine-tuning, and cost control, while proprietary models often lead in raw performance and managed API convenience.
The gap is narrowing rapidly. Open-source models like DeepSeek, Qwen, and LLaMA now compete with proprietary models on many benchmarks. However, proprietary models often still lead in raw performance on the most demanding tasks.
Open-source models offer full transparency, self-hosting capability, fine-tuning freedom, no vendor lock-in, and often lower costs. They are ideal for privacy-sensitive applications and organizations that need full control over their AI stack.
The top-scoring open-source model is shown in our leaderboard above. Rankings update hourly based on composite scores that combine benchmarks, pricing, capabilities, and community adoption.
We classify models based on whether their weights are publicly available for download and modification. Models with open weights but restrictive licenses are still counted as open source for this comparison.