比较开源和专有模型在性能、价格、功能和稳定性方面的表现。追踪 300 个模型,帮助您决定哪种方法最适合您的需求。
| 指标 | 开源 | 闭源 |
|---|---|---|
| 模型数量 | 135 | 165 |
| 平均评分 | 54.8 | 61.5 |
| 中位分 | 53.3 | 64.8 |
| 最佳评分 | 80.5Gemma 4 31B (free) | 91.9GPT-5.4 Pro |
| 平均成本 ($/1M) | $0.654 | $11.02 |
| 免费模型 | 15 | 7 |
| 平均上下文窗口 | 199K | 486K |
| 稳定模型占比 | 98.5% | 97.0% |
| 脆弱模型占比 | 0.0% | 0.0% |
| # | 模型 | 评分 |
|---|---|---|
| 1 | Gemma 4 31B (free)Google | 81 |
| 2 | Gemma 4 31BGoogle | 81 |
| 3 | Qwen3.5 397B A17BAlibaba | 80 |
| 4 | R1 0528DeepSeek | 79 |
| 5 | MiniMax M2.5 (free)MiniMax | 78 |
| 6 | MiniMax M2.5MiniMax | 78 |
| 7 | GLM 5Zhipu AI | 78 |
| 8 | Qwen3.5-122B-A10BAlibaba | 78 |
| 9 | Gemma 2 27BGoogle | 77 |
| 10 | Qwen3.5-27BAlibaba | 77 |
| 11 | GLM 5.1Zhipu AI | 76 |
| 12 | Qwen3.5-35B-A3BAlibaba | 76 |
| 13 | Kimi K2.6Moonshot AI | 76 |
| 14 | MiMo-V2.5-ProXiaomi | 76 |
| 15 | DeepSeek V4 ProDeepSeek | 76 |
| 16 | GLM 4.5Zhipu AI | 75 |
| 17 | Gemma 4 26B A4B (free)Google | 73 |
| 18 | Gemma 4 26B A4B Google | 73 |
| 19 | R1DeepSeek | 73 |
| 20 | GLM 4.7Zhipu AI | 73 |
| # | 模型 | 评分 |
|---|---|---|
| 1 | GPT-5.4 ProOpenAI | 92 |
| 2 | GPT-5.4OpenAI | 92 |
| 3 | GPT-5.2 ProOpenAI | 91 |
| 4 | Claude Opus 4.6 (Fast)Anthropic | 90 |
| 5 | Claude Opus 4.6Anthropic | 90 |
| 6 | GPT-5.2-CodexOpenAI | 90 |
| 7 | GPT-5.2OpenAI | 90 |
| 8 | Grok 4.20xAI | 89 |
| 9 | GPT-5.3-CodexOpenAI | 89 |
| 10 | GPT-5 ProOpenAI | 89 |
| 11 | Gemini 3 Flash PreviewGoogle | 88 |
| 12 | Grok 4xAI | 88 |
| 13 | Grok 4.20 Multi-AgentxAI | 88 |
| 14 | GPT-5.1-Codex-MaxOpenAI | 88 |
| 15 | GPT-5 CodexOpenAI | 88 |
| 16 | GPT-5OpenAI | 88 |
| 17 | GPT-5.3 ChatOpenAI | 87 |
| 18 | GPT-5.1OpenAI | 87 |
| 19 | GPT-5.1-CodexOpenAI | 87 |
| 20 | GPT-5.1-Codex-MiniOpenAI | 87 |
| 功能 | 开源 | 闭源 |
|---|---|---|
| 视觉 | 42 (31.1%) | 110 (66.7%) |
| 函数调用 | 107 (79.3%) | 144 (87.3%) |
| 流式输出 | 135 (100.0%) | 165 (100.0%) |
| JSON模式 | 105 (77.8%) | 136 (82.4%) |
| 推理 | 80 (59.3%) | 102 (61.8%) |
| 网页搜索 | 0 (0.0%) | 73 (44.2%) |
| 图像输出 | 0 (0.0%) | 0 (0.0%) |
开源 领先于 free model availability, lower average pricing. 拥有 15 个免费模型,开源提供了最便捷的实验和原型开发入口。
闭源 领先于 average score, median score, model count, context window size, top model performance, capability coverage. 顶级专有模型 (GPT-5.4 Pro) 达到 92 分,设定了当前的性能上限。
在 300 个追踪模型中(135 个开源,165 个专有),格局正在快速演变。开源模型在自托管、微调和成本控制方面表现出色,而专有模型通常在原始性能和托管API便利性方面领先。
The gap is narrowing rapidly. Open-source models like DeepSeek, Qwen, and LLaMA now compete with proprietary models on many benchmarks. However, proprietary models often still lead in raw performance on the most demanding tasks.
Open-source models offer full transparency, self-hosting capability, fine-tuning freedom, no vendor lock-in, and often lower costs. They are ideal for privacy-sensitive applications and organizations that need full control over their AI stack.
The top-scoring open-source model is shown in our leaderboard above. Rankings update hourly based on composite scores that combine benchmarks, pricing, capabilities, and community adoption.
We classify models based on whether their weights are publicly available for download and modification. Models with open weights but restrictive licenses are still counted as open source for this comparison.