Advanced Controls

Reasoning Effort (Thinking Budget)

Controls how much internal reasoning the model does before answering.

What it does

Reasoning effort controls the depth of the model's internal chain-of-thought before producing a final answer. Higher effort means more thinking tokens (more compute, higher latency, better quality on hard problems). Lower effort means faster, cheaper responses.

How it works

Models with extended thinking capabilities (Claude with thinking, OpenAI o-series, DeepSeek R1) generate internal reasoning tokens not shown in the final output. The reasoning effort parameter controls the budget for these tokens.

When to use it

Set high for math, logic, coding problems, and complex analysis. Set low for simple Q&A, formatting tasks, and classification. Adjust dynamically based on task complexity to optimize cost.

Common mistakes

Using maximum reasoning effort for every request (expensive and slow for simple tasks). Expecting reasoning effort to help with knowledge gaps. Not accounting for thinking tokens in cost calculations.

Default values by provider

Anthropic Claude: "low"/"medium"/"high" budget_tokens. OpenAI o-series: "low"/"medium"/"high". DeepSeek R1: automatic thinking.

Related parameters

System Prompt

Sets the model personality, behavior rules, and context before the conversation starts.

Prompt Caching

Reuses previously processed prompt prefixes to reduce latency and cost.

System Prompt Prompt Caching