Compare open-source and proprietary models across performance, pricing, capabilities, and stability. Tracking 300 models to help you decide which approach best fits your needs.
| Metric | Open Source | Proprietary |
|---|---|---|
| Model Count | 135 | 165 |
| Avg Score | 54.8 | 61.5 |
| Median Score | 53.3 | 64.8 |
| Best Score | 80.5Gemma 4 31B (free) | 91.9GPT-5.4 Pro |
| Avg Cost ($/1M) | $0.654 | $11.02 |
| Free Models | 15 | 7 |
| Avg Context Window | 199K | 486K |
| Stable Models % | 98.5% | 97.0% |
| Fragile Models % | 0.0% | 0.0% |
| # | Model | Score |
|---|---|---|
| 1 | Gemma 4 31B (free)Google | 81 |
| 2 | Gemma 4 31BGoogle | 81 |
| 3 | Qwen3.5 397B A17BAlibaba | 80 |
| 4 | R1 0528DeepSeek | 79 |
| 5 | MiniMax M2.5 (free)MiniMax | 78 |
| 6 | MiniMax M2.5MiniMax | 78 |
| 7 | GLM 5Zhipu AI | 78 |
| 8 | Qwen3.5-122B-A10BAlibaba | 78 |
| 9 | Gemma 2 27BGoogle | 77 |
| 10 | Qwen3.5-27BAlibaba | 77 |
| 11 | GLM 5.1Zhipu AI | 76 |
| 12 | Qwen3.5-35B-A3BAlibaba | 76 |
| 13 | Kimi K2.6Moonshot AI | 76 |
| 14 | MiMo-V2.5-ProXiaomi | 76 |
| 15 | DeepSeek V4 ProDeepSeek | 76 |
| 16 | GLM 4.5Zhipu AI | 75 |
| 17 | Gemma 4 26B A4B (free)Google | 73 |
| 18 | Gemma 4 26B A4B Google | 73 |
| 19 | R1DeepSeek | 73 |
| 20 | GLM 4.7Zhipu AI | 73 |
| # | Model | Score |
|---|---|---|
| 1 | GPT-5.4 ProOpenAI | 92 |
| 2 | GPT-5.4OpenAI | 92 |
| 3 | GPT-5.2 ProOpenAI | 91 |
| 4 | Claude Opus 4.6 (Fast)Anthropic | 90 |
| 5 | Claude Opus 4.6Anthropic | 90 |
| 6 | GPT-5.2-CodexOpenAI | 90 |
| 7 | GPT-5.2OpenAI | 90 |
| 8 | Grok 4.20xAI | 89 |
| 9 | GPT-5.3-CodexOpenAI | 89 |
| 10 | GPT-5 ProOpenAI | 89 |
| 11 | Gemini 3 Flash PreviewGoogle | 88 |
| 12 | Grok 4xAI | 88 |
| 13 | Grok 4.20 Multi-AgentxAI | 88 |
| 14 | GPT-5.1-Codex-MaxOpenAI | 88 |
| 15 | GPT-5 CodexOpenAI | 88 |
| 16 | GPT-5OpenAI | 88 |
| 17 | GPT-5.3 ChatOpenAI | 87 |
| 18 | GPT-5.1OpenAI | 87 |
| 19 | GPT-5.1-CodexOpenAI | 87 |
| 20 | GPT-5.1-Codex-MiniOpenAI | 87 |
| Capability | Open Source | Proprietary |
|---|---|---|
| Vision | 42 (31.1%) | 110 (66.7%) |
| Function Calling | 107 (79.3%) | 144 (87.3%) |
| Streaming | 135 (100.0%) | 165 (100.0%) |
| JSON Mode | 105 (77.8%) | 136 (82.4%) |
| Reasoning | 80 (59.3%) | 102 (61.8%) |
| Web Search | 0 (0.0%) | 73 (44.2%) |
| Image Output | 0 (0.0%) | 0 (0.0%) |
Open Source leads in free model availability, lower average pricing. With 15 free models, open-source offers the most accessible entry point for experimentation and prototyping.
Proprietary leads in average score, median score, model count, context window size, top model performance, capability coverage. The top proprietary model (GPT-5.4 Pro) achieves a score of 92, setting the current performance ceiling.
Across 300 tracked models (135 open-source, 165 proprietary), the landscape continues to evolve rapidly. Open-source models excel for self-hosting, fine-tuning, and cost control, while proprietary models often lead in raw performance and managed API convenience.
The gap is narrowing rapidly. Open-source models like DeepSeek, Qwen, and LLaMA now compete with proprietary models on many benchmarks. However, proprietary models often still lead in raw performance on the most demanding tasks.
Open-source models offer full transparency, self-hosting capability, fine-tuning freedom, no vendor lock-in, and often lower costs. They are ideal for privacy-sensitive applications and organizations that need full control over their AI stack.
The top-scoring open-source model is shown in our leaderboard above. Rankings update hourly based on composite scores that combine benchmarks, pricing, capabilities, and community adoption.
We classify models based on whether their weights are publicly available for download and modification. Models with open weights but restrictive licenses are still counted as open source for this comparison.