Temperature: Determinism Is Not Accuracy | Foundations

Temperature controls output randomness. That’s it. Not speed, not length, not accuracy, not confidence. Just how much variation the model introduces when selecting each token.

The scale

Temperature 0 — near-deterministic. Same input → (nearly) same output every time. The model always picks the highest-probability token.
Temperature 0.5 — balanced. Some variation while staying mostly consistent.
Temperature 1.0 — maximum diversity. Most creative, most varied, least predictable.

The critical misconception

Temperature 0 does NOT guarantee accuracy. It guarantees determinism. The model consistently picks the highest-probability token — but the highest-probability token can be factually wrong. A model that “thinks” the capital of Australia is Sydney will say Sydney every single time at temperature 0. Deterministic ≠ correct.

Temperature affects the probability distribution of token selection. It doesn’t add a “confidence filter” that blocks uncertain answers. It doesn’t make the model “only say things it’s sure about.”

What temperature doesn’t control

Speed — temperature doesn’t affect generation speed. Same computation per token regardless of temperature setting.
Length — output length is governed by max_tokens and the model’s natural completion behavior, not temperature.
Capability — the model doesn’t unlock better reasoning at any temperature. It’s the same model with different sampling behavior.

Match temperature to task

Task type	Temperature	Why
Data extraction	0	Same invoice should always extract the same values
Classification	0	Same ticket should always get the same category
Creative writing	1.0	Varied, interesting output is the goal
Brainstorming	0.7-1.0	Diversity generates more ideas
General conversation	0.5	Balance between consistency and naturalness

For extraction and classification, even 0.5 introduces unnecessary variation. When there’s one correct answer (the invoice total IS $127.50), randomness adds no value — it only adds inconsistency.

For creative tasks, temperature 0 produces repetitive, flat output. The diversity at higher temperatures is a genuine benefit, not a bug.

One-liner: Temperature controls randomness (0 = deterministic, 1.0 = maximum variety) but NOT accuracy — a model can be consistently wrong at temperature 0. Match temperature to task: 0 for extraction, 1.0 for creativity.