Controls how much internal reasoning the model does before answering.
Reasoning effort controls the depth of the model's internal chain-of-thought before producing a final answer. Higher effort means more thinking tokens (more compute, higher latency, better quality on hard problems). Lower effort means faster, cheaper responses.
Models with extended thinking capabilities (Claude with thinking, OpenAI o-series, DeepSeek R1) generate internal reasoning tokens not shown in the final output. The reasoning effort parameter controls the budget for these tokens.
Set high for math, logic, coding problems, and complex analysis. Set low for simple Q&A, formatting tasks, and classification. Adjust dynamically based on task complexity to optimize cost.
Using maximum reasoning effort for every request (expensive and slow for simple tasks). Expecting reasoning effort to help with knowledge gaps. Not accounting for thinking tokens in cost calculations.
Anthropic Claude: "low"/"medium"/"high" budget_tokens. OpenAI o-series: "low"/"medium"/"high". DeepSeek R1: automatic thinking.