Controls randomness in model output. Lower = more focused, higher = more creative.
Temperature scales the probability distribution over the next token before sampling. At temperature 0, the model always picks the highest-probability token (greedy decoding). At temperature 1, it samples proportionally to the raw probabilities. Above 1, it flattens the distribution, making unlikely tokens more probable.
Technically, temperature divides the logits (raw scores) by a scalar before applying softmax. A temperature of 0.5 doubles the logits, making the distribution sharper. A temperature of 2.0 halves them, making it flatter. The result: low temperature produces repetitive, deterministic output; high temperature produces diverse but potentially incoherent output.
Use temperature 0-0.3 for factual tasks like code generation, data extraction, and classification where you need consistent, correct answers. Use 0.7-1.0 for creative writing, brainstorming, and conversational responses where variety matters. Rarely go above 1.5 unless you want deliberately wild output for artistic purposes.
Setting temperature to 0 and wondering why the output is boring. Setting it to 1.5+ and getting gibberish. Changing temperature AND top-p at the same time, which makes the effects unpredictable. Most providers recommend adjusting one or the other, not both.
OpenAI: 1.0, Anthropic: 1.0, Google Gemini: 1.0, Mistral: 0.7. Most APIs accept 0-2 range.