Limits token selection to the smallest set whose cumulative probability exceeds P.
Top-P (also called nucleus sampling) restricts the model to sampling only from the most probable tokens whose combined probability mass reaches the threshold P. If top-p is 0.9, the model considers only the tokens that together account for 90% of the probability, ignoring the long tail of unlikely tokens.
After computing token probabilities, the model sorts them from highest to lowest. It walks down the list, accumulating probability until it reaches the top-p threshold. Only tokens within this nucleus are candidates for sampling. This dynamically adjusts the candidate set size.
Top-P 0.1 gives you very focused, almost deterministic output. Top-P 0.9 is a good default for most tasks. Top-P 1.0 disables nucleus sampling entirely (all tokens are candidates). Use lower values for code and structured output.
Adjusting both temperature and top-p simultaneously. OpenAI recommends changing one or the other. Setting top-p too low (0.01) and getting repetitive loops. Confusing top-p with top-k.
OpenAI: 1.0, Anthropic: 0.999, Google Gemini: 0.95, Mistral: 1.0. Range is 0-1.