S1.3.2 Task 1.3

Parallel Execution: 80% Wall-Clock Reduction, Zero Extra Cost

When the coordinator emits multiple Task tool calls in a single response, the SDK executes them in parallel. This is the highest-impact, lowest-effort optimization in multi-agent systems — and it’s free. Same API calls, same tokens, same cost. Just faster.

The mechanism and the math

Multiple Task tool_use blocks in one coordinator response → parallel execution. Total latency = max(agent times), not sum.

  • Serial: 3 agents × 30s = 90s
  • Parallel: max(30s, 30s, 30s) = 30s

Production data at scale: 200 queries/day with 5 agents each:

  • Serial: 200 × 125s = 25,000s ≈ 7 hours/day
  • Parallel: 200 × 25s = 5,000s ≈ 1.4 hours/day
  • Savings: 5.6 hours/day (80% reduction), identical API cost

The serial anti-pattern: diagnostic signature

If 3 independent agents each take ~30s and total time is ~90s, you have serial execution. The pattern is unmistakable. The coordinator is issuing one Task per turn instead of all three in one response. Fix: emit all independent Task calls in a single response.

Parallel only works for independent tasks

A linear dependency chain (OCR → NLP → Validation → Enrichment) CANNOT be parallelized. Each stage needs the previous stage’s output as input. Running NLP in parallel with OCR means NLP starts without OCR text — it fails or produces garbage.

The rule: parallel within independence, sequential across dependencies. Apply parallelism across documents (process document A and document B simultaneously), not across dependent pipeline stages.

Dependency graph scheduling

Complex task graphs with mixed dependencies require multi-phase scheduling:

Tasks: A(independent), B(independent), C(depends A),
       D(depends B), E(depends C+D), F(independent)

Phase 1: A + B + F  (all independent → parallel)
Phase 2: C + D      (dependencies met → parallel)
Phase 3: E          (all dependencies met → run)

Key insights:

  • F starts in phase 1, not later — it’s independent and has no reason to wait
  • D doesn’t wait for A — D depends on B, not A. Start it as soon as B completes
  • E waits for C AND D — start only when both dependencies are met
  • Total time: max(A,B,F) + max(C,D) + E — the critical-path optimum

The coordinator handles this naturally through its agentic loop: emit parallel Task calls for independent tasks (phase 1), receive results, emit parallel Task calls for newly-unblocked tasks (phase 2), and so on.

Mixed independence and dependency

A customer request needs: billing check (independent), shipping status (independent), refund calculation (depends on both).

Turn 1: emit billing + shipping Task calls in one response → parallel execution Turn 2: receive both results → emit refund Task call with combined results

Total time: max(billing, shipping) + refund. Better than serial (billing + shipping + refund).

Partial failure in parallel execution

Three agents run in parallel. Two complete successfully. One times out. What now?

Correct: collect the 2 successful results, record the timeout with structured error context, then decide — retry the failed agent alone (if time permits) or proceed with partial results and an explicit gap annotation.

Wrong: discard all 3 results (wastes completed work), wait indefinitely (blocks everything), retry all 3 (wastes compute on already-successful agents).

The coordinator preserves the value of completed work while handling the failure. This is the same partial-failure principle from K1.2.1 applied to parallel execution.

No external infrastructure needed

Parallel execution doesn’t require message queues, thread pools, or async frameworks. The SDK handles it natively when the coordinator emits multiple tool calls in one response. No developer-managed concurrency.


One-liner: Emit all independent Task calls in one coordinator response for parallel execution — 80% wall-clock reduction at zero extra cost, but only works for independent tasks. Respect dependency chains with multi-phase scheduling and handle partial failures by preserving completed results.