Without Classification, the Agent Wastes 5 Retries on a Permission Error | Tool Design & MCP Integration

An agent retries a “Insufficient permissions” error 5 times in 30 seconds — every retry fails identically. Why? The tool returns isError: true with no errorCategory or isRetryable field. Without classification, the agent applies its default retry strategy to ALL errors, including permanent ones.

The four error types and their recovery paths

Type	Examples	isRetryable	Agent action
Transient	Timeout, rate limit, service overload	`true`	Wait briefly, retry with backoff
Validation	Wrong format, missing field, invalid parameter	`true` (after fix)	Fix specific input, retry
Business	Refund exceeds limit, account suspended, policy violation	`false`	Explain to user, escalate
Permission	Access denied, insufficient privileges, expired credentials	`false`	Escalate, request authorization

Why the agent retries permission errors

Without errorCategory and isRetryable, the agent has no signal that the error is permanent. Its default strategy (retry with backoff) is designed for transient errors — the most common type. So it retries everything, including:

Permission errors (will never succeed without credential changes)
Business rule violations (will never succeed without policy exceptions)
Validation errors with the same bad input (will fail identically every time)

The fix: errorCategory: "permission" with isRetryable: false → agent immediately stops retrying and escalates.

Transient: retry with backoff

Database timeouts, rate limits, service overload. The service is temporarily unavailable but will recover. Mark as transient + retryable → agent waits briefly and retries. 92% of transient errors auto-recover with structured metadata (vs 15% with generic errors).

Don’t confuse with: validation errors (bad input, not service issue), permission errors (access denied, not temporary overload).

Validation: fix input, then retry

Wrong date format (“15th of March, 2024” instead of ISO 8601), missing required field, invalid parameter value. The input is wrong. Mark as validation + retryable → but the agent must fix the input first.

The best validation errors include field-level details: which field failed, what format is expected, what was received, and an example of the correct format. “date field must be ISO 8601 (YYYY-MM-DD), received ‘15th of March, 2024’, please convert to ‘2024-03-15’” enables single-retry self-correction.

Business: explain and escalate

Refund exceeds $500 limit, account suspended, product discontinued. Business rules prevent the operation. Mark as business + non-retryable → agent explains the constraint to the user and escalates if needed.

Include a customerMessage for user-facing explanation and suggestedAction for the agent’s next step.

Permission: escalate immediately

Insufficient privileges, expired API key, unauthorized operation. Mark as permission + non-retryable → agent escalates to someone with the required authorization. Zero retries — the same credentials will fail every time.

One-liner: Without error classification, agents default to retrying everything (wasting 5 attempts on permanent errors) — classify as transient (retry), validation (fix input), business (explain/escalate), or permission (escalate immediately) with isRetryable to enable intelligent recovery.