Sampling Parameters

Temperature

Controls randomness in model output. Lower = more focused, higher = more creative.

What it does

Temperature scales the probability distribution over the next token before sampling. At temperature 0, the model always picks the highest-probability token (greedy decoding). At temperature 1, it samples proportionally to the raw probabilities. Above 1, it flattens the distribution, making unlikely tokens more probable.

How it works

Technically, temperature divides the logits (raw scores) by a scalar before applying softmax. A temperature of 0.5 doubles the logits, making the distribution sharper. A temperature of 2.0 halves them, making it flatter. The result: low temperature produces repetitive, deterministic output; high temperature produces diverse but potentially incoherent output.

When to use it

Use temperature 0-0.3 for factual tasks like code generation, data extraction, and classification where you need consistent, correct answers. Use 0.7-1.0 for creative writing, brainstorming, and conversational responses where variety matters. Rarely go above 1.5 unless you want deliberately wild output for artistic purposes.

Common mistakes

Setting temperature to 0 and wondering why the output is boring. Setting it to 1.5+ and getting gibberish. Changing temperature AND top-p at the same time, which makes the effects unpredictable. Most providers recommend adjusting one or the other, not both.

Default values by provider

OpenAI: 1.0, Anthropic: 1.0, Google Gemini: 1.0, Mistral: 0.7. Most APIs accept 0-2 range.

Related parameters

Top-P (Nucleus Sampling)

Limits token selection to the smallest set whose cumulative probability exceeds P.

Frequency Penalty

Penalizes tokens proportionally to how often they appeared, reducing repetition.

Top-P (Nucleus Sampling)