分析300个追踪模型中哪些AI功能最常见和最稀有,以及每项功能如何与综合评分相关。发现采用趋势、评分溢价和功能组合。
| 功能 | 模型 | 采用率 | 评分溢价 |
|---|---|---|---|
Streaming | 300 | 100.0% | +68.0 |
JSON Mode | 232 | 77.3% | +13.7 |
Function Calling | 224 | 74.7% | +16.1 |
Reasoning | 144 | 48.0% | +18.4 |
Vision | 131 | 43.7% | +14.1 |
Web Search | 56 | 18.7% | +15.6 |
Image Output | 0 | 0.0% | -68.0 |
| # | 组合 | 模型 | 平均评分 |
|---|---|---|---|
| 1 | JSON Mode + Streaming | 232 | 71.2 |
| 2 | Function Calling + Streaming | 224 | 72.1 |
| 3 | Function Calling + JSON Mode | 194 | 72.9 |
| 4 | Function Calling + JSON Mode + Streaming | 194 | 72.9 |
| 5 | Reasoning + Streaming | 144 | 77.6 |
| 6 | Streaming + Vision | 131 | 76.0 |
| 7 | Function Calling + Reasoning | 124 | 79.4 |
| 8 | Function Calling + Reasoning + Streaming | 124 | 79.4 |
| 9 | JSON Mode + Reasoning | 118 | 79.6 |
| 10 | JSON Mode + Reasoning + Streaming | 118 | 79.6 |
目前没有模型同时具备全部7项功能(视觉、函数调用、流式输出、JSON模式、推理、网页搜索和图像输出)。"全栈"模型是指支持所有追踪功能的模型。
| 提供商 | Vision | Function Calling | Streaming | JSON Mode | Reasoning | Web Search | Image Output |
|---|---|---|---|---|---|---|---|
| OpenAI(60) | 40/6067% | 53/6088% | 60/60100% | 58/6097% | 32/6053% | 29/6048% | 00% |
| Alibaba(50) | 18/5036% | 45/5090% | 50/50100% | 47/5094% | 24/5048% | 00% | 00% |
| Mistral AI(25) | 11/2544% | 23/2592% | 25/25100% | 23/2592% | 1/254% | 00% | 00% |
| Google(23) | 18/2378% | 12/2352% | 23/23100% | 20/2387% | 10/2343% | 00% | 00% |
| Meta(14) | 4/1429% | 7/1450% | 14/14100% | 9/1464% | 00% | 00% | 00% |
| Anthropic(13) | 13/13100% | 13/13100% | 13/13100% | 6/1346% | 10/1377% | 11/1385% | 00% |
| NVIDIA(11) | 2/1118% | 9/1182% | 11/11100% | 9/1182% | 10/1191% | 00% | 00% |
| DeepSeek(11) | 00% | 8/1173% | 11/11100% | 10/1191% | 10/1191% | 00% | 00% |
| xAI(10) | 5/1050% | 9/1090% | 10/10100% | 10/10100% | 8/1080% | 10/10100% | 00% |
| MiniMax(8) | 1/813% | 6/875% | 8/8100% | 5/863% | 6/875% | 00% | 00% |
| arcee-ai(7) | 1/714% | 4/757% | 7/7100% | 3/743% | 2/729% | 1/714% | 00% |
| ByteDance(5) | 5/5100% | 4/580% | 5/5100% | 4/580% | 4/580% | 00% | 00% |
| Liquid AI(5) | 00% | 00% | 5/5100% | 00% | 1/520% | 00% | 00% |
| Amazon(5) | 4/580% | 5/5100% | 5/5100% | 00% | 1/520% | 00% | 00% |
| Perplexity(5) | 4/580% | 00% | 5/5100% | 1/520% | 3/560% | 5/5100% | 00% |
| Baidu(5) | 2/540% | 2/540% | 5/5100% | 1/520% | 3/560% | 00% | 00% |
| Moonshot AI(4) | 1/425% | 4/4100% | 4/4100% | 4/4100% | 2/450% | 00% | 00% |
| Allen AI(4) | 00% | 1/425% | 4/4100% | 3/475% | 2/450% | 00% | 00% |
| Cohere(4) | 00% | 2/450% | 4/4100% | 4/4100% | 00% | 00% | 00% |
| Xiaomi(3) | 1/333% | 3/3100% | 3/3100% | 3/3100% | 3/3100% | 00% | 00% |
| Inception(3) | 00% | 3/3100% | 3/3100% | 3/3100% | 1/333% | 00% | 00% |
| aion-labs(3) | 00% | 00% | 3/3100% | 00% | 3/3100% | 00% | 00% |
We track seven key capabilities: Vision (image understanding), Function Calling (tool use), Streaming (real-time output), JSON Mode (structured output), Reasoning (chain-of-thought), Web Search (live information retrieval), and Image Output (image generation).
Score premium measures how much higher (or lower) the average composite score is for models that have a specific capability compared to those that lack it. A positive premium means models with that capability tend to score higher overall.
A full stack model supports all seven tracked capabilities: vision, function calling, streaming, JSON mode, reasoning, web search, and image output. These are the most versatile models available.
The most and least common capabilities are shown in the Overview section above. Adoption rates vary widely, with some capabilities like streaming being near-universal while others like image output are much rarer.