K4.3.3 Task 4.3

Required Fields on Optional Data: The #1 Structural Cause of Hallucination

When tool_use enforces a JSON schema with a required string field, the model MUST return a string — it structurally cannot return null. If the data does not exist in the source document, the model fabricates a plausible value. This is not a model limitation. It is a schema design problem.

The Data

Field availabilityRequired field accuracyNullable field accuracy
Always present97-98%97-98%
Sometimes present74% (82% hallucinated)90%
Rarely present51% (93% hallucinated)~90%

For always-present fields, required vs nullable makes no difference. For rarely-present fields, required causes 93% hallucination. The schema constraint forces fabrication.

The Rule

Only place fields in the required array if they genuinely exist in EVERY possible input document. Fields that are sometimes or rarely present should use type: ["string", "null"] and be excluded from required.

{
  "type": "object",
  "properties": {
    "customer_name": {"type": "string"},
    "order_id": {"type": "string"},
    "warranty_expiry": {"type": ["string", "null"]}
  },
  "required": ["customer_name", "order_id"]
}

warranty_expiry can now be null when the document has no warranty information. No fabrication needed.

Schema Constraints Override Everything

Prompt instructions (“return N/A for missing fields”), few-shot examples showing null, and confidence scoring CANNOT override the structural requirement of tool_use. If a field is required with type: "string", the model generates a string — period.

Schema change must come first. No other intervention works until the structural requirement is relaxed.

Null vs Sentinel Values

Using 0, "N/A", or empty string as “absent” markers conflates “no data” with potentially valid values. 0 might be a real quantity. "N/A" might appear in actual text. Null is the semantically correct representation of absent data.

The Enum + “Other” Escape

Closed enums without an “other” option force misclassification when input does not match any value. Add "other" to the enum plus a nullable _detail field:

"category": {"type": "string", "enum": ["bug", "feature", "docs", "other"]},
"category_detail": {"type": ["string", "null"]}

This handles unlimited future values with minimal schema complexity.

One Schema with Nullable > Multiple Schemas

Creating separate schemas per document type adds routing complexity and misclassification risk. One schema with nullable fields handles all document variants — invoices with warranty fields, invoices without, receipts, quotes — in a single extraction tool.


One-liner: Make rarely-present fields nullable in the schema — required fields force the model to fabricate values for absent data, causing 93% hallucination on fields that do not exist in the source.