Batch results are not ordered. Position-based matching (first result = first request) is wrong. custom_id is the only reliable mechanism to match results to requests.
Design for Traceability
A good custom_id is self-contained and informative:
batch-20260327-003_doc-INV-4521
This encodes: batch date, batch number, document type, and document ID. When a failure occurs months later, the custom_id alone tells you what to investigate.
Selective Failure Recovery
When 60 of 1,000 requests fail, resubmit only the 60 failures — not the entire batch. Use custom_id to identify which requests failed, fix the root cause per error type, and create a recovery batch containing only the fixed requests.
Resubmitting all 1,000 wastes 940 successful API calls worth of budget.
Batch Does Not Support Multi-Turn Tool Calling
Each batch request is a single message-response pair. The model may return tool_use content blocks, but those tool calls can never be executed within the batch — there is no mechanism to feed tool results back. Multi-turn agentic workflows must use the sync API.
A research pipeline saved 50% on 2/3 of its workload by batching single-turn extraction stages while keeping the multi-turn tool-calling orchestration on sync.
One-liner: Design custom_ids for self-contained traceability, use them for selective failure retry, and never assume result ordering — position-based matching will silently corrupt your data.