Setting isError: true is just the starting point. Generic “Operation failed” produces only 15% agent recovery. Adding structured metadata — error category, retryability, customer message, suggested action — enables 78-95% recovery across different error types.
The recovery rate data
| Error type | Metadata provided | Recovery rate |
|---|---|---|
| Transient (isRetryable: true) | Category + retryability | 92% auto-recovered |
| Validation (field-level details) | Category + specific errors | 78% auto-corrected |
| Business (isRetryable: false) | Category + customer message | 95% correctly escalated |
| Generic (“Operation failed”) | None | 15% recovery |
The difference between 15% and 78-95% is structured metadata, not model capability.
The basic pattern
{
"content": [{"type": "text", "text": "Database query timed out after 30s"}],
"isError": true
}
The agent knows the tool failed and sees why. But it doesn’t know: should it retry? Is this temporary? What should it tell the user?
The full structured pattern
{
"content": [{"type": "text", "text": "Refund of $750 exceeds $500 policy limit"}],
"isError": true,
"structuredContent": {
"errorCategory": "business",
"isRetryable": false,
"customerMessage": "Refunds over $500 require manager approval",
"suggestedAction": "escalate_to_human"
}
}
Now the agent knows: it’s a business rule (not transient), retrying won’t help, it can tell the customer why, and it should escalate.
Four anti-patterns
Silent swallow: isError: false with empty content when the database is unreachable. The agent tells the researcher “no papers found” — when thousands exist. The database was down, not empty.
Success with error text: isError: false with content “error occurred.” The agent treats this as the data returned.
Unhandled exception: tool errors propagate as exceptions → JSON-RPC protocol errors that crash the connection. Converts recoverable tool errors into non-recoverable protocol errors.
Generic message: “Operation failed” for every error type. Agent retries everything identically — wasting attempts on non-retryable errors.
Protocol errors vs tool execution errors
| Type | Example | LLM recoverability |
|---|---|---|
| Protocol error | Misspelled tool name | Low (code fix needed) |
| Tool execution error | API timeout | High (LLM can retry/adapt) |
The MCP spec says tool execution errors SHOULD be provided to the LLM for self-correction. Letting exceptions propagate converts recoverable errors into non-recoverable ones.
Security: sanitized error content
Wrong: “Connection to db-prod-3.internal:5432 refused.” Right: “Database temporarily unavailable.” Include error CATEGORY and RECOVERABILITY. Log full technical details server-side. The agent gets recovery metadata without sensitive internals.
One-liner: isError: true with structured metadata (errorCategory, isRetryable, customerMessage, suggestedAction) enables 78-95% agent recovery vs 15% for generic messages — catch all tool errors as CallToolResult, never let them become protocol exceptions, and sanitize for security.