Every parameter you can tweak when calling an LLM API, explained with practical defaults and real trade-offs. Stop guessing, start configuring with confidence.
Control the randomness and diversity of model output
Set boundaries on input and output sizes
Enable specific model features and output formats
Fine-tune model behavior for specialized use cases
Temperature and max_tokens are the two parameters that matter most for most applications. Temperature controls output randomness (lower for code, higher for creative text), and max_tokens prevents runaway responses and controls cost. Everything else is situational.
Pick one, not both. OpenAI explicitly recommends adjusting temperature OR top-p, not both simultaneously. Temperature is more intuitive for most developers. Use top-p when you want dynamic candidate pool sizing based on model confidence.
Use temperature 0-0.2 for code generation. You want deterministic, correct output, not creative variations. Some developers use temperature 0 (greedy decoding) for maximum consistency, though a small amount of randomness (0.1) can sometimes help avoid getting stuck in local optima.
No. Core parameters like temperature and max_tokens are nearly universal, but advanced features vary. Function calling requires GPT-4o+, Claude 3+, or Gemini. Reasoning effort is only on thinking-capable models (o-series, Claude with thinking, DeepSeek R1). Prompt caching availability and pricing differs by provider.