When every few-shot example shows all fields successfully extracted with non-null values, the model learns an implicit rule: every field must always have a value. When processing a document where warranty_expiry genuinely does not exist, the model fabricates a plausible date.
The fix: include at least one example where a field is absent and the correct output is null.
The Null-Example Technique
A single example showing "warranty_expiry": null with a note “not present in source document” reduced hallucination from 31% to 4%. One targeted example, 87% reduction.
Hallucination rate correlates inversely with field availability:
- Fields always present (name, title): 96-99% accuracy
- Fields sometimes present (order ID): 87%
- Fields rarely present (warranty, police report): 62%
The gap is entirely from never having seen an example of the null case. The model has been trained by examples to always produce a value — it needs to see that null is a valid answer.
Text Instructions Cannot Override This
“Only extract explicitly stated information” and “NEVER fabricate values” — these strong text instructions fail while few-shot examples succeed. The model follows demonstrated patterns over described rules. Making instructions more emphatic (“CRITICAL: Under absolutely no circumstances…”) does not change their effectiveness.
The “make your best estimate” instruction actively encourages hallucination. Replace it with examples showing null returns for absent data.
Diverse Document Coverage
For a system handling empirical papers, theoretical papers, and review articles — replace 3 same-type examples with:
- Empirical paper (all fields populated)
- Theoretical paper (several fields null — no experimental data)
- Partial data document (some fields null with notes)
This covers all three hallucination patterns within the same token budget. The model sees that different document types have different field availability, and null is appropriate when data is absent.
The source_status Pattern
The most advanced application adds a source_status metadata field to each extracted value:
"extracted"— value found in source"not_found_in_source"— field absent from document"ambiguous"— data exists but unclear
This enables downstream systems to make informed decisions without trusting every extracted value equally.
What Does Not Work
LLM confidence scores cannot detect hallucination. The model reports high confidence on fabricated values because the fabrication is plausible. Confidence-based filtering is unreliable.
Post-extraction validation catches some errors but misses plausible fabrications. A fabricated date of “2025-06-15” for a warranty field passes format validation while being completely invented.
Secondary verification models add infrastructure cost and introduce new failure modes. The root cause — missing null examples — is cheaper and more effective to fix.
One-liner: Include at least one few-shot example where a field is genuinely absent and the output is null — this single technique reduces hallucination from 31% to 4% because the model needs to see that “no data” is a valid answer.