Capabilities

Streaming

Receive model output token-by-token as it generates.

What it does

Streaming sends each generated token to your application as soon as it is produced, rather than buffering the entire response. This dramatically reduces perceived latency because users see output appearing within milliseconds.

How it works

The API uses Server-Sent Events (SSE) to push incremental chunks. Each chunk typically contains one or a few tokens. Your client processes these chunks as they arrive. The stream ends with a [DONE] signal or equivalent.

When to use it

Any user-facing chat interface where perceived speed matters. Long-form generation where waiting would frustrate users. Real-time applications where you want to start processing partial output immediately.

Common mistakes

Not implementing proper error handling for stream disconnections. Trying to parse incomplete JSON from a streaming response before it finishes. Not realizing streaming does not change token pricing, just delivery timing.

Default values by provider

Supported by all major providers. OpenAI: stream=true, Anthropic: stream=true, Google: stream=true. Default is non-streaming.

Related parameters

Function Calling (Tool Use)

Lets the model invoke external functions/APIs by outputting structured calls.

JSON Mode (Structured Output)

Forces the model to output valid JSON instead of free-form text.

JSON Mode (Structured Output)System Prompt