AI微调价格对比

微调让您在自己的数据上训练已有的AI模型，使其专门针对您的领域、格式或任务 - 比较所有主要服务商的训练和推理成本。

对比模型数

最便宜的训练

$0.48/M tokens

免费训练

2 models

按模型的微调价格

提供商	模型	训练 $/百万	推理输入 $/百万	推理输出 $/百万	最少示例	托管
Google	Gemini 2.0 Flash	免费	$0.10	$0.40	100	Managed
Google	Gemini 1.5 Flash	免费	$0.075	$0.30	100	Managed
Together AI	Llama 3.1 8B	$0.48	$0.18	$0.18	1	Managed
Fireworks	Llama 3.1 8B	$0.50	$0.20	$0.20	1	Managed + Self-host
Mistral	Mistral Small	$2.00	$0.10	$0.30	1	Managed
Cohere	Command R	$2.00	$0.15	$0.60	2	Managed
OpenAI	GPT-4o Mini	$3.00	$0.30	$1.20	10	Managed
Together AI	Llama 3.1 70B	$3.50	$0.88	$0.88	1	Managed
Fireworks	Llama 3.1 70B	$4.00	$0.90	$0.90	1	Managed + Self-host
Cohere	Command R+	$5.00	$2.50	$10.00	2	Managed
Mistral	Mistral Medium	$6.00	$2.50	$7.50	1	Managed
OpenAI	GPT-3.5 Turbo	$8.00	$3.00	$6.00	10	Managed
Together AI	Llama 3.1 405B	$8.00	$5.00	$5.00	1	Managed
OpenAI	GPT-4o	$25.00	$3.75	$15.00	10	Managed

微调 vs 提示词工程

因素	微调	提示词工程
设置时间	数小时到数天（数据准备+训练）	数分钟到数小时
前期成本	训练Token成本（见上表）	无
每请求成本	较低（所需提示词更短）	较高（需要长系统提示+少样本示例）
迭代速度	慢（每次更改需重新训练）	快（编辑提示词并测试）
最适用于	一致的格式化、领域专业知识、生产工作负载	原型设计、通用任务、低使用量
数据要求	需要精心整理的训练示例	无需训练数据

何时微调有意义

微调的好理由

+您的提示词持续冗长且重复，推高了推理成本
+您需要基础模型难以维持的特定输出格式
+您有特定领域的术语或风格要求
+高流量生产环境中，每令牌的节省会累积
+您希望通过消除冗长的少样本示例来获得更快的响应速度

何时避免微调

-您仍在尝试提示词，尚未确定稳定的用例
-提示词工程或检索增强生成(RAG)已经运行良好
-您的高质量训练示例不足50条
-您的任务经常变化，需要频繁重新训练
-请求量低，训练成本无法收回

如何估算微调投资回报率

微调有前期训练成本，但可以通过缩短提示词来降低持续推理成本。要估算微调是否值得：

测量当前提示词长度。 计算系统提示和少样本示例中的令牌数。典型的少样本提示可能每次请求使用500-2,000个令牌。
估算微调后的提示词长度。 微调后的模型通常可以完全去掉少样本示例，将输入减少到仅用户查询（50-200个令牌）。
计算每次请求的令牌节省。 将节省的输入令牌数乘以每令牌输入价格。
计算盈亏平衡量。 将总训练成本除以每次请求的节省。例如，如果训练成本为50美元，每次请求节省0.005美元，那么在10,000次请求后即可收回成本。

对于月请求量超过10,000次的大多数生产应用，微调通常在第一个月内即可收回成本。

Frequently Asked Questions

Fine-tuning is the process of training a pre-trained language model on your own dataset to specialize it for a specific task, domain, or output style. Instead of training from scratch, you adapt an existing model using examples of the inputs and outputs you want.

OpenAI charges $25.00 per million training tokens for GPT-4o fine-tuning. After training, inference costs $3.75 per million input tokens and $15.00 per million output tokens. You need a minimum of 10 training examples, though OpenAI recommends 50-100 for best results.

Yes, Google currently offers free training for Gemini 2.0 Flash and Gemini 1.5 Flash fine-tuning. You only pay for inference after training. However, Google requires a minimum of 100 training examples, which is higher than most other providers.

The minimum varies by provider: Together AI and Fireworks require just 1 example, Cohere requires 2, OpenAI requires 10, and Google requires 100. In practice, most fine-tuning jobs benefit from at least 50-100 high-quality examples, with diminishing returns above 1,000.

For zero training cost, Google Gemini models offer free fine-tuning. For open-source models, Together AI offers Llama 3.1 8B training at $0.48 per million tokens, making it the cheapest paid option. Fireworks also offers competitive pricing with the added benefit of self-hosting your fine-tuned model.

Use prompt engineering first -- it requires no upfront cost and is easy to iterate on. Fine-tuning makes sense when you need consistent formatting, domain-specific behavior, lower per-request latency, or when your prompts are getting too long and expensive. Fine-tuning can actually reduce inference costs by removing the need for lengthy system prompts.

Yes, but options vary by provider. Fireworks explicitly supports both managed hosting and self-hosting of fine-tuned models. Open-source models (Llama, Mistral) fine-tuned through Together AI or Fireworks can typically be exported and self-hosted. OpenAI and Google fine-tuned models can only run on their respective platforms.