Without Format Rules, the Same Date Comes Out Three Different Ways | Prompt Engineering & Optimization

Without explicit format standardization rules, the model reproduces input formats inconsistently. A date entered as “March 15, 2025” might output as “March 15, 2025”, “3/15/2025”, or “2025-03-15” depending on the run. Downstream parsers break.

Specify Output Standards Explicitly

Data type	Standard	Rule
Dates	ISO 8601	`YYYY-MM-DD`
Phone numbers	E.164	`+[country code][number]`
Currency	Numeric + code	Amount as number, currency as separate field
Addresses	Structured fields	street, city, state, zip — not a single string

The model handles input-to-standard conversion. It can read “March 15th” and output “2025-03-15” when told to. It can read “$1,234.56 USD” and output {"amount": 1234.56, "currency": "USD"}.

Schema Types Are Necessary but Not Sufficient

type: number prevents strings but does not specify decimal conventions. type: string for dates allows “12/31/24” and “2025-12-31” equally. JSON Schema format keywords are hints, not strict enforcers. Prompt-level rules define the exact standard within each type.

Measured Impact

Adding format standardization rules to an extraction prompt:

Category	Before	After
Date parsing errors	12%	0.5%
Currency parsing errors	8%	0.2%
Phone parsing errors	3%	0.1%
Total downstream errors	23%	0.8%

97% reduction. Each rule targeted a specific format category with measurable, additive improvement.

One Set of Rules for All Countries

For multi-country pipelines (15+ countries), define a single set of international output standards. Do not create country-specific prompts or downstream parsers. The model semantically understands format variations across countries and converts to the specified standard.

Structured Fields Beat Single Strings

For complex data like addresses, define separate schema fields instead of a single string. Combined with prompt decomposition rules, this produces consistently parseable output that does not require regex splitting.

One-liner: Add explicit format rules to your extraction prompt — dates as ISO 8601, currency as numeric, phones as E.164 — and downstream parsing errors drop from 23% to 0.8%.