Sampling Parameters

Frequency Penalty

Penalizes tokens proportionally to how often they appeared, reducing repetition.

What it does

Frequency penalty reduces the likelihood of repeating the same token based on how many times it already appeared. A token used 5 times gets penalized 5x more than one used once. This directly combats repetitive loops.

How it works

Before sampling, the model subtracts a penalty from each token's logit proportional to its count in the output so far. Positive values discourage repetition. The effect is cumulative: the more a token has been used, the harder it becomes to use again.

When to use it

When the model gets stuck in repetitive loops. For creative writing where you want vocabulary diversity. Not recommended for code generation, where repeating variable names and syntax is necessary.

Common mistakes

Setting it too high for code generation. Confusing frequency penalty (proportional to count) with presence penalty (flat penalty for any occurrence). Using it alongside temperature with unpredictable combined effects.

Default values by provider

OpenAI: 0 (disabled), range -2 to 2. Anthropic: not exposed. Google: frequencyPenalty 0-1. Typical useful range: 0.1-0.5.

Related parameters

Temperature

Controls randomness in model output. Lower = more focused, higher = more creative.

Top-P (Nucleus Sampling)

Limits token selection to the smallest set whose cumulative probability exceeds P.

Prompt Caching