F7.3 F7

Temperature: Determinism Is Not Accuracy

Temperature controls output randomness. That’s it. Not speed, not length, not accuracy, not confidence. Just how much variation the model introduces when selecting each token.

The scale

  • Temperature 0 — near-deterministic. Same input → (nearly) same output every time. The model always picks the highest-probability token.
  • Temperature 0.5 — balanced. Some variation while staying mostly consistent.
  • Temperature 1.0 — maximum diversity. Most creative, most varied, least predictable.

The critical misconception

Temperature 0 does NOT guarantee accuracy. It guarantees determinism. The model consistently picks the highest-probability token — but the highest-probability token can be factually wrong. A model that “thinks” the capital of Australia is Sydney will say Sydney every single time at temperature 0. Deterministic ≠ correct.

Temperature affects the probability distribution of token selection. It doesn’t add a “confidence filter” that blocks uncertain answers. It doesn’t make the model “only say things it’s sure about.”

What temperature doesn’t control

  • Speed — temperature doesn’t affect generation speed. Same computation per token regardless of temperature setting.
  • Length — output length is governed by max_tokens and the model’s natural completion behavior, not temperature.
  • Capability — the model doesn’t unlock better reasoning at any temperature. It’s the same model with different sampling behavior.

Match temperature to task

Task typeTemperatureWhy
Data extraction0Same invoice should always extract the same values
Classification0Same ticket should always get the same category
Creative writing1.0Varied, interesting output is the goal
Brainstorming0.7-1.0Diversity generates more ideas
General conversation0.5Balance between consistency and naturalness

For extraction and classification, even 0.5 introduces unnecessary variation. When there’s one correct answer (the invoice total IS $127.50), randomness adds no value — it only adds inconsistency.

For creative tasks, temperature 0 produces repetitive, flat output. The diversity at higher temperatures is a genuine benefit, not a bug.


One-liner: Temperature controls randomness (0 = deterministic, 1.0 = maximum variety) but NOT accuracy — a model can be consistently wrong at temperature 0. Match temperature to task: 0 for extraction, 1.0 for creativity.