K4.5.5 Task 4.5

Different Errors Need Different Fixes — Don't Resubmit the Whole Batch

Error Distribution Pattern

Typical batch failure breakdown:

  • 60% context exceeded → chunk the documents
  • 25% malformed input → fix the request data
  • 15% transient/expired → retry as-is

Treating all errors identically (blind retry) reproduces the 60% context errors and 25% malformed errors. Only the 15% transient failures benefit from unmodified retry.

Cancellation Saves Money

If early results from a batch reveal a systematic prompt error (e.g., wrong extraction schema), cancel the running batch before it completes. Processing the remaining 90% of requests with a known-broken prompt wastes the entire cost.

Time Budget for Recovery

With a 30-hour SLA:

  • Batch processing: up to 24 hours (worst case)
  • Recovery window: 6 hours
  • Strategy: submit batches every 4-6 hours, leaving time for one full recovery round within the SLA

Sample Testing Before Full Submission

Test prompts on a diverse 20-50 document sample before submitting thousands. One team’s comparison:

  • Without sample testing: 18% failure rate, $740 total cost
  • With sample testing: 3% failure rate, $519 total cost (30% savings)

The $8/month sample investment saved $300/month in reprocessing — 37x ROI.


One-liner: Test on a 20-50 document sample first (37x ROI), cancel batches with systematic errors early, fix different error types differently, and budget time for one recovery round within your SLA.