When tool_use enforces a JSON schema with a required string field, the model MUST return a string — it structurally cannot return null. If the data does not exist in the source document, the model fabricates a plausible value. This is not a model limitation. It is a schema design problem.
The Data
| Field availability | Required field accuracy | Nullable field accuracy |
|---|---|---|
| Always present | 97-98% | 97-98% |
| Sometimes present | 74% (82% hallucinated) | 90% |
| Rarely present | 51% (93% hallucinated) | ~90% |
For always-present fields, required vs nullable makes no difference. For rarely-present fields, required causes 93% hallucination. The schema constraint forces fabrication.
The Rule
Only place fields in the required array if they genuinely exist in EVERY possible input document. Fields that are sometimes or rarely present should use type: ["string", "null"] and be excluded from required.
{
"type": "object",
"properties": {
"customer_name": {"type": "string"},
"order_id": {"type": "string"},
"warranty_expiry": {"type": ["string", "null"]}
},
"required": ["customer_name", "order_id"]
}
warranty_expiry can now be null when the document has no warranty information. No fabrication needed.
Schema Constraints Override Everything
Prompt instructions (“return N/A for missing fields”), few-shot examples showing null, and confidence scoring CANNOT override the structural requirement of tool_use. If a field is required with type: "string", the model generates a string — period.
Schema change must come first. No other intervention works until the structural requirement is relaxed.
Null vs Sentinel Values
Using 0, "N/A", or empty string as “absent” markers conflates “no data” with potentially valid values. 0 might be a real quantity. "N/A" might appear in actual text. Null is the semantically correct representation of absent data.
The Enum + “Other” Escape
Closed enums without an “other” option force misclassification when input does not match any value. Add "other" to the enum plus a nullable _detail field:
"category": {"type": "string", "enum": ["bug", "feature", "docs", "other"]},
"category_detail": {"type": ["string", "null"]}
This handles unlimited future values with minimal schema complexity.
One Schema with Nullable > Multiple Schemas
Creating separate schemas per document type adds routing complexity and misclassification risk. One schema with nullable fields handles all document variants — invoices with warranty fields, invoices without, receipts, quotes — in a single extraction tool.
One-liner: Make rarely-present fields nullable in the schema — required fields force the model to fabricate values for absent data, causing 93% hallucination on fields that do not exist in the source.