The API Is Stateless: Send Full History Every Time | Agentic Architecture & Orchestration

The Claude API remembers nothing between requests. Every API call is a fresh start. If you don’t send the full conversation history — including all prior tool results — the model has no memory of what it’s done, what it’s found, or why it called the tools in the first place.

The correct format

When the model requests a tool call, you execute the tool and append the result to the conversation history:

Role: "user" (not “assistant” — tool results come from the client side)
Content type: "tool_result" with the matching tool_use_id
Then: send the FULL message array (all prior messages + this new tool_result) in the next API request

If the model makes multiple tool calls in one response (e.g., get_customer and get_order), return all results in a single user message with multiple tool_result content blocks, each carrying its matching tool_use_id. Not separate user messages — one message, multiple content blocks.

The “model forgets everything” bug

Classic symptom: the model ignores tool results and repeats the exact same tool call it already made. This happens consistently, not intermittently.

Root cause: the tool_result was never appended to the messages array. The API is stateless — from the model’s perspective, it requested the tool but never received the output. So it requests again. And again. Infinite loop.

This is the most common implementation bug in agentic loops. First diagnostic step for any “model loses context after tool calls” issue: verify the full conversation history is being sent with each request.

Tool errors are tool results too

When a tool execution fails (non-zero exit code, API timeout, permission denied), pass the error back as a tool_result with the error details (and is_error flag if using MCP). Continue the loop. The model needs to see failures — it can retry with different parameters, try an alternative tool, or report the issue.

Terminating the loop on tool failure removes the model’s ability to adapt. Silently swallowing the error leaves the model waiting for a result that never comes.

Managing growing history

A session with 30+ tool calls, each returning ~2,000 tokens, accumulates massive history. At some point it exceeds the context window. Three strategies:

Condense before appending: extract only relevant fields from verbose tool results. A 3,000-token customer record might yield ~200 tokens of relevant data (name, order ID, status). This reduces re-transmission costs across all subsequent requests. One production system cut tool-result token costs by 65% this way.

Progressive summarization: older tool results get condensed into compact summaries while recent ones keep full detail. The model retains awareness of earlier findings at lower token cost.

Persistent findings block: maintain a “Key Findings” or “Analysis Results” section that accumulates critical discoveries. Even as older tool results get summarized, the key conclusions survive. This prevents the model from contradicting earlier findings when context gets compressed.

The anti-pattern: dropping old tool results entirely. The model loses awareness of earlier findings, may re-request the same tools, and risks contradicting its own previous conclusions.

One-liner: Tool results go in role: "user" with matching tool_use_id, the full history must be sent every request (stateless API), errors are results too — and condense verbose outputs before they bloat your history.