K4.5.1 Task 4.5

50% Cost Savings — But Up to 24 Hours Wait

The Message Batches API provides 50% cost savings for non-urgent, large-volume processing. The tradeoff: a processing window of up to 24 hours with no guaranteed latency SLA. Results are retained for 29 days.

When to Use Batch

Batch: Non-urgent bulk processing — nightly report generation, weekly document classification, batch data extraction. Latency tolerance measured in hours.

Sync: Real-time interactions, multi-turn tool-calling workflows, anything requiring immediate response. Batch does NOT support multi-turn tool calling — each request is a single message-response pair.

One team classifying 3 workflows correctly (sync for real-time chat, batch for nightly extraction + weekly reports) saved ~33% of a $9,000/month bill.

Always Budget 24 Hours

Average processing time varies (2-19 hours in one dataset, 6.2h average across 30 runs). But there is no SLA guarantee. Budget the full 24-hour maximum, not the average. If your SLA requires 30-hour turnaround, a 24-hour batch leaves only 6 hours for failure recovery.

custom_id Is Everything

Batch results return in arbitrary order. custom_id is the only reliable correlation mechanism. Design meaningful IDs:

batch-20260327-003_doc-INV-4521

This enables: result-to-request matching, selective failure retry, and audit traceability — all from the ID alone.


One-liner: Batch API saves 50% on non-urgent bulk work but takes up to 24 hours and does not support multi-turn tool calling — match API to latency needs.