Temperature controls output randomness. That’s it. Not speed, not length, not accuracy, not confidence. Just how much variation the model introduces when selecting each token.
The scale
- Temperature 0 — near-deterministic. Same input → (nearly) same output every time. The model always picks the highest-probability token.
- Temperature 0.5 — balanced. Some variation while staying mostly consistent.
- Temperature 1.0 — maximum diversity. Most creative, most varied, least predictable.
The critical misconception
Temperature 0 does NOT guarantee accuracy. It guarantees determinism. The model consistently picks the highest-probability token — but the highest-probability token can be factually wrong. A model that “thinks” the capital of Australia is Sydney will say Sydney every single time at temperature 0. Deterministic ≠ correct.
Temperature affects the probability distribution of token selection. It doesn’t add a “confidence filter” that blocks uncertain answers. It doesn’t make the model “only say things it’s sure about.”
What temperature doesn’t control
- Speed — temperature doesn’t affect generation speed. Same computation per token regardless of temperature setting.
- Length — output length is governed by
max_tokensand the model’s natural completion behavior, not temperature. - Capability — the model doesn’t unlock better reasoning at any temperature. It’s the same model with different sampling behavior.
Match temperature to task
| Task type | Temperature | Why |
|---|---|---|
| Data extraction | 0 | Same invoice should always extract the same values |
| Classification | 0 | Same ticket should always get the same category |
| Creative writing | 1.0 | Varied, interesting output is the goal |
| Brainstorming | 0.7-1.0 | Diversity generates more ideas |
| General conversation | 0.5 | Balance between consistency and naturalness |
For extraction and classification, even 0.5 introduces unnecessary variation. When there’s one correct answer (the invoice total IS $127.50), randomness adds no value — it only adds inconsistency.
For creative tasks, temperature 0 produces repetitive, flat output. The diversity at higher temperatures is a genuine benefit, not a bug.
One-liner: Temperature controls randomness (0 = deterministic, 1.0 = maximum variety) but NOT accuracy — a model can be consistently wrong at temperature 0. Match temperature to task: 0 for extraction, 1.0 for creativity.