Resume Saves 85% Tokens and Chains Across Multiple Follow-Ups | Agentic Architecture & Orchestration

When a sub-agent finishes its first task, it may have accumulated valuable context — 50+ files explored, 15 papers found, dependency graphs built. Resume preserves this context for follow-ups at a fraction of the cost of starting fresh. And it’s not limited to one follow-up — resume supports multiple sequential cycles.

The resume flow

First query: coordinator calls Task tool → sub-agent runs → returns ResultMessage
Capture: extract session_id from ResultMessage, agentId from response content
Follow-up: call query() with resume=session_id and a prompt referencing the agentId

The resumed session picks up exactly where it left off — all tool results, findings, reasoning, and intermediate context intact.

The data: resume vs new session

A production comparison for multi-turn sub-agent workflows:

Metric	New session (re-inject context)	Resume
Prompt tokens per follow-up	8,000	1,200
Latency per follow-up	15 sec	4 sec
Answer quality	Same	Same

85% token reduction, 73% latency reduction, identical quality. Over 3 follow-ups per task: ~20,400 tokens and ~33 seconds saved per task.

Why the difference? New sessions must re-inject context as prompt text (expensive, lossy). Resume already has the context in the session state (free, lossless). Less input tokens doesn’t mean less information — the resumed agent has MORE context because it retains the full session, not just a summary.

Multi-resume chaining

Resume is NOT limited to a single follow-up. The same session_id supports multiple sequential resumes:

Initial audit → finds vulnerabilities
Resume #1 → coordinator asks for fix suggestions for top issues
Resume #2 → coordinator asks agent to verify fixes address the findings
Resume #3 → coordinator asks for final summary

Each resume accumulates context. After 3 cycles, the sub-agent has the full history: initial findings, fix suggestions, verification results. Starting a new session at any step would lose this accumulated understanding.

Cache all session_ids

When a coordinator spawns 5 sub-agents per research task, cache all 5 session_ids — not just the one you expect to follow up with. Session_ids are lightweight string identifiers with no ongoing storage cost. Unused sessions remain idle at zero cost.

The SDK supports resuming any valid session_id, not just the most recently completed one. If an unexpected question requires returning to sub-agent #2 (out of 5), the cached session_id enables it without re-analysis.

The most common resume bug

# First query
result = await query(prompt="Audit the auth module", options=options)
# BUG: session_id not captured here

# Follow-up attempt
await query(prompt="Suggest fixes", options=ClaudeAgentOptions(resume=None))
# Error: "Cannot resume session: session_id is required"

The coordinator completed the first query but never captured session_id from the ResultMessage. The resume call has resume=None. Fix: always capture session metadata during initial queries, even if you’re not sure you’ll need follow-ups.

The error message is specific: “session_id is required” — not “session expired” or “topic mismatch.” The SDK does not enforce topic similarity between queries; any follow-up prompt works as long as the session_id is valid.

When NOT to resume

Independent tasks: if the follow-up has nothing to do with the first task, a fresh session is cleaner. The resumed session carries all prior context, consuming context budget on irrelevant history.

Stale context: if the underlying data has changed significantly since the first query (files modified, databases updated), the sub-agent’s cached tool results reference outdated information. Start fresh with an injected summary of prior findings.

One-liner: Resume saves 85% tokens and 73% latency over new sessions with identical quality, supports chaining multiple follow-ups on the same session_id, and requires capturing session_id from the first ResultMessage — cache all session_ids, they’re cheap.