Without explicit format standardization rules, the model reproduces input formats inconsistently. A date entered as “March 15, 2025” might output as “March 15, 2025”, “3/15/2025”, or “2025-03-15” depending on the run. Downstream parsers break.
Specify Output Standards Explicitly
| Data type | Standard | Rule |
|---|---|---|
| Dates | ISO 8601 | YYYY-MM-DD |
| Phone numbers | E.164 | +[country code][number] |
| Currency | Numeric + code | Amount as number, currency as separate field |
| Addresses | Structured fields | street, city, state, zip — not a single string |
The model handles input-to-standard conversion. It can read “March 15th” and output “2025-03-15” when told to. It can read “$1,234.56 USD” and output {"amount": 1234.56, "currency": "USD"}.
Schema Types Are Necessary but Not Sufficient
type: number prevents strings but does not specify decimal conventions. type: string for dates allows “12/31/24” and “2025-12-31” equally. JSON Schema format keywords are hints, not strict enforcers. Prompt-level rules define the exact standard within each type.
Measured Impact
Adding format standardization rules to an extraction prompt:
| Category | Before | After |
|---|---|---|
| Date parsing errors | 12% | 0.5% |
| Currency parsing errors | 8% | 0.2% |
| Phone parsing errors | 3% | 0.1% |
| Total downstream errors | 23% | 0.8% |
97% reduction. Each rule targeted a specific format category with measurable, additive improvement.
One Set of Rules for All Countries
For multi-country pipelines (15+ countries), define a single set of international output standards. Do not create country-specific prompts or downstream parsers. The model semantically understands format variations across countries and converts to the specified standard.
Structured Fields Beat Single Strings
For complex data like addresses, define separate schema fields instead of a single string. Combined with prompt decomposition rules, this produces consistently parseable output that does not require regex splitting.
One-liner: Add explicit format rules to your extraction prompt — dates as ISO 8601, currency as numeric, phones as E.164 — and downstream parsing errors drop from 23% to 0.8%.