比较开源和专有模型在性能、价格、功能和稳定性方面的表现。追踪 300 个模型,帮助您决定哪种方法最适合您的需求。
| 指标 | 开源 | 闭源 |
|---|---|---|
| 模型数量 | 145 | 155 |
| 平均评分 | 65.1 | 70.8 |
| 中位分 | 68.3 | 75.5 |
| 最佳评分 | 85.0Qwen3.5-9B | 94.0GPT-5.4 Pro |
| 平均成本 ($/1M) | $0.551 | $9.58 |
| 免费模型 | 23 | 0 |
| 平均上下文窗口 | 144K | 401K |
| 稳定模型占比 | 42.8% | 49.7% |
| 脆弱模型占比 | 56.6% | 50.3% |
| # | 模型 | 评分 |
|---|---|---|
| 1 | Qwen3.5-9BAlibaba | 85 |
| 2 | Kimi K2.5Moonshot AI | 85 |
| 3 | Qwen3 VL 8B ThinkingAlibaba | 85 |
| 4 | Qwen3 VL 30B A3B ThinkingAlibaba | 85 |
| 5 | Nemotron 3 Super (free)NVIDIA | 84 |
| 6 | MiniMax M2.5 (free)MiniMax | 83 |
| 7 | MiniMax M2.7MiniMax | 83 |
| 8 | MiMo-V2-FlashXiaomi | 83 |
| 9 | Trinity Miniarcee-ai | 82 |
| 10 | Nemotron Nano 12B 2 VL (free)NVIDIA | 82 |
| 11 | Tongyi DeepResearch 30B A3BAlibaba | 82 |
| 12 | Qwen3.5 397B A17BAlibaba | 82 |
| 13 | gpt-oss-safeguard-20bOpenAI | 82 |
| 14 | Qwen3 VL 32B InstructAlibaba | 81 |
| 15 | Qwen3 VL 8B InstructAlibaba | 81 |
| 16 | Qwen3 VL 30B A3B InstructAlibaba | 81 |
| 17 | Qwen3 30B A3B Thinking 2507Alibaba | 81 |
| 18 | Qwen3.5-122B-A10BAlibaba | 80 |
| 19 | Mistral Small 4Mistral AI | 79 |
| 20 | Qwen3.5-27BAlibaba | 79 |
| # | 模型 | 评分 |
|---|---|---|
| 1 | GPT-5.4 ProOpenAI | 94 |
| 2 | GPT-5.4OpenAI | 94 |
| 3 | GPT-5.4 MiniOpenAI | 93 |
| 4 | GPT-5.2 ProOpenAI | 93 |
| 5 | GPT-5.2OpenAI | 93 |
| 6 | Claude Opus 4.6Anthropic | 92 |
| 7 | GPT-5 ProOpenAI | 92 |
| 8 | o3 Deep ResearchOpenAI | 92 |
| 9 | Claude Opus 4.5Anthropic | 90 |
| 10 | GPT-5OpenAI | 90 |
| 11 | Gemini 3 Flash PreviewGoogle | 89 |
| 12 | Claude Sonnet 4.6Anthropic | 89 |
| 13 | Claude Sonnet 4.5Anthropic | 89 |
| 14 | o3 ProOpenAI | 88 |
| 15 | Grok 4.1 FastxAI | 87 |
| 16 | Grok 4.20 BetaxAI | 86 |
| 17 | Grok 4xAI | 86 |
| 18 | Gemini 3.1 Pro PreviewGoogle | 86 |
| 19 | o3OpenAI | 86 |
| 20 | GPT-5.1OpenAI | 85 |
| 功能 | 开源 | 闭源 |
|---|---|---|
| 视觉 | 39 (26.9%) | 92 (59.4%) |
| 函数调用 | 96 (66.2%) | 128 (82.6%) |
| 流式输出 | 145 (100.0%) | 155 (100.0%) |
| JSON模式 | 110 (75.9%) | 122 (78.7%) |
| 推理 | 66 (45.5%) | 78 (50.3%) |
| 网页搜索 | 1 (0.7%) | 55 (35.5%) |
| 图像输出 | 0 (0.0%) | 0 (0.0%) |
开源 领先于 free model availability, lower average pricing. 拥有 23 个免费模型,开源提供了最便捷的实验和原型开发入口。
闭源 领先于 average score, median score, model count, context window size, top model performance, capability coverage. 顶级专有模型 (GPT-5.4 Pro) 达到 94 分,设定了当前的性能上限。
在 300 个追踪模型中(145 个开源,155 个专有),格局正在快速演变。开源模型在自托管、微调和成本控制方面表现出色,而专有模型通常在原始性能和托管API便利性方面领先。
The gap is narrowing rapidly. Open-source models like DeepSeek, Qwen, and LLaMA now compete with proprietary models on many benchmarks. However, proprietary models often still lead in raw performance on the most demanding tasks.
Open-source models offer full transparency, self-hosting capability, fine-tuning freedom, no vendor lock-in, and often lower costs. They are ideal for privacy-sensitive applications and organizations that need full control over their AI stack.
The top-scoring open-source model is shown in our leaderboard above. Rankings update hourly based on composite scores that combine benchmarks, pricing, capabilities, and community adoption.
We classify models based on whether their weights are publicly available for download and modification. Models with open weights but restrictive licenses are still counted as open source for this comparison.