K4.3.4 Task 4.3

Without Format Rules, the Same Date Comes Out Three Different Ways

Without explicit format standardization rules, the model reproduces input formats inconsistently. A date entered as “March 15, 2025” might output as “March 15, 2025”, “3/15/2025”, or “2025-03-15” depending on the run. Downstream parsers break.

Specify Output Standards Explicitly

Data typeStandardRule
DatesISO 8601YYYY-MM-DD
Phone numbersE.164+[country code][number]
CurrencyNumeric + codeAmount as number, currency as separate field
AddressesStructured fieldsstreet, city, state, zip — not a single string

The model handles input-to-standard conversion. It can read “March 15th” and output “2025-03-15” when told to. It can read “$1,234.56 USD” and output {"amount": 1234.56, "currency": "USD"}.

Schema Types Are Necessary but Not Sufficient

type: number prevents strings but does not specify decimal conventions. type: string for dates allows “12/31/24” and “2025-12-31” equally. JSON Schema format keywords are hints, not strict enforcers. Prompt-level rules define the exact standard within each type.

Measured Impact

Adding format standardization rules to an extraction prompt:

CategoryBeforeAfter
Date parsing errors12%0.5%
Currency parsing errors8%0.2%
Phone parsing errors3%0.1%
Total downstream errors23%0.8%

97% reduction. Each rule targeted a specific format category with measurable, additive improvement.

One Set of Rules for All Countries

For multi-country pipelines (15+ countries), define a single set of international output standards. Do not create country-specific prompts or downstream parsers. The model semantically understands format variations across countries and converts to the specified standard.

Structured Fields Beat Single Strings

For complex data like addresses, define separate schema fields instead of a single string. Combined with prompt decomposition rules, this produces consistently parseable output that does not require regex splitting.


One-liner: Add explicit format rules to your extraction prompt — dates as ISO 8601, currency as numeric, phones as E.164 — and downstream parsing errors drop from 23% to 0.8%.