Different Errors Need Different Fixes — Don't Resubmit the Whole Batch | Prompt Engineering & Optimization

Error Distribution Pattern

Typical batch failure breakdown:

60% context exceeded → chunk the documents
25% malformed input → fix the request data
15% transient/expired → retry as-is

Treating all errors identically (blind retry) reproduces the 60% context errors and 25% malformed errors. Only the 15% transient failures benefit from unmodified retry.

Cancellation Saves Money

If early results from a batch reveal a systematic prompt error (e.g., wrong extraction schema), cancel the running batch before it completes. Processing the remaining 90% of requests with a known-broken prompt wastes the entire cost.

Time Budget for Recovery

With a 30-hour SLA:

Batch processing: up to 24 hours (worst case)
Recovery window: 6 hours
Strategy: submit batches every 4-6 hours, leaving time for one full recovery round within the SLA

Sample Testing Before Full Submission

Test prompts on a diverse 20-50 document sample before submitting thousands. One team’s comparison:

Without sample testing: 18% failure rate, $740 total cost
With sample testing: 3% failure rate, $519 total cost (30% savings)

The $8/month sample investment saved $300/month in reprocessing — 37x ROI.

One-liner: Test on a 20-50 document sample first (37x ROI), cancel batches with systematic errors early, fix different error types differently, and budget time for one recovery round within your SLA.