Sampling Parameters

Top-P (Nucleus Sampling)

Limits token selection to the smallest set whose cumulative probability exceeds P.

What it does

Top-P (also called nucleus sampling) restricts the model to sampling only from the most probable tokens whose combined probability mass reaches the threshold P. If top-p is 0.9, the model considers only the tokens that together account for 90% of the probability, ignoring the long tail of unlikely tokens.

How it works

After computing token probabilities, the model sorts them from highest to lowest. It walks down the list, accumulating probability until it reaches the top-p threshold. Only tokens within this nucleus are candidates for sampling. This dynamically adjusts the candidate set size.

When to use it

Top-P 0.1 gives you very focused, almost deterministic output. Top-P 0.9 is a good default for most tasks. Top-P 1.0 disables nucleus sampling entirely (all tokens are candidates). Use lower values for code and structured output.

Common mistakes

Adjusting both temperature and top-p simultaneously. OpenAI recommends changing one or the other. Setting top-p too low (0.01) and getting repetitive loops. Confusing top-p with top-k.

Default values by provider

OpenAI: 1.0, Anthropic: 0.999, Google Gemini: 0.95, Mistral: 1.0. Range is 0-1.

Related parameters

Temperature

Controls randomness in model output. Lower = more focused, higher = more creative.

Frequency Penalty

Penalizes tokens proportionally to how often they appeared, reducing repetition.

Temperature Max Tokens (Max Output Length)