Three approaches for extracting structured data from research documents, tested over 100 runs:
| Approach | Schema compliance | Extra fields |
|---|---|---|
| Prompt: “output as JSON” | 73% | 15% |
--output-format json | 89% | 8% |
--output-format json + --json-schema | 99% | 0% |
Each layer of enforcement adds measurable reliability. Prompts alone leave 27% of runs with structural issues. The format flag brings it to 89%. The schema flag reaches 99% with zero unexpected fields.
The Two Flags
--output-format json — Ensures the output is valid JSON. Does not control field names, nesting, or structure. Claude may use “urgency” instead of “priority” — both are valid JSON, but the downstream dashboard breaks.
--json-schema '{...}' — Enforces exact field names, nesting, required/optional fields, and types at generation time. Claude cannot produce “urgency” when the schema says “priority.” Supports nested objects, conditional optionals, and all standard JSON Schema features.
Both flags require -p (print mode). The --json-schema flag specifically is only available in print mode — using it without -p produces a “flag not recognized” error.
The Complete CI Command
# Basic JSON output
claude -p "Extract findings" --output-format json > findings.json
# Schema-constrained JSON output
claude -p "Extract ticket metadata" \
--output-format json \
--json-schema '{"type":"object","properties":{"ticket_id":{"type":"string"},"priority":{"type":"string"},"category":{"type":"string"},"summary":{"type":"string"}},"required":["ticket_id","priority","category","summary"]}' \
> metadata.json
Why Prompt Instructions Are Not Enough
Over 100 CI runs with --output-format json only: 87 produced correct JSON with expected field names. 13 used “urgency” instead of “priority.” Same prompt, same input data.
This is not a bug. Language models naturally substitute synonyms. “Priority” and “urgency” are semantically equivalent — the model sees no difference. But the dashboard API expects “priority” exactly.
--json-schema eliminates this variance by enforcing exact property names at the structural level. 100/100 runs produced “priority” as specified.
Detailed prompt instructions (“use the field name ‘priority’, not ‘urgency’”) reduce variance but cannot guarantee it. Structural enforcement is the only guarantee.
Two Outputs, Two Calls
When a pipeline needs both structured JSON for an API and human-readable text for a PR comment:
# Structured JSON for bug tracker API
claude -p "Analyze for bugs" --output-format json --json-schema '{...}' > bugs.json
# Human-readable text for PR comment
claude -p "Summarize analysis for PR comment" > comment.txt
Do not combine both formats in one call with delimiters. A delimiter-separated mixed output requires fragile splitting logic and risks JSON corruption. Separate calls with appropriate format flags produce clean, predictable outputs.
Progressive Refinement
Start simple, add constraints as needed:
--output-format json— Confirms JSON output works in the pipeline--json-schema— Adds exact structural enforcement when field precision matters- Schema refinement — Expand the schema with nested objects, optional fields, and conditionals as requirements evolve
The schema can express complex structures: nested objects for grouped findings, optional “suggestion” fields only for critical severity, arrays for multiple results. JSON Schema’s full feature set is available.
Things That Do Not Exist
--format json— Wrong flag name. Use--output-format json.CLAUDE_FORMAT=json— Environment variable does not exist.CLAUDE_OUTPUT_FORMAT=json— Environment variable does not exist.--json-schemain interactive mode — Only available with-p.
One-liner: --output-format json ensures valid JSON; --json-schema ensures the exact field names, nesting, and structure — together they reach 99% schema compliance where prompts alone achieve 73%.