K2.2.3 Task 2.2

"Operation Failed" × 5 Retries × 30 Seconds = Customer Waiting for Nothing

Every error returns {"content": [{"text": "Operation failed"}], "isError": true}. A suspended account, a timeout, a malformed date — all identical. The agent retries a permanent permission error 5 times in 30 seconds. The customer waits for nothing.

The anti-pattern

{"content": [{"type": "text", "text": "Operation failed"}], "isError": true}

This is returned for:

  • Database timeout (should retry)
  • Invalid date format (should fix input)
  • Account suspended (should escalate)
  • Refund exceeds limit (should inform customer)

The agent sees four identical responses and applies one strategy: retry. Three of four error types will never succeed on retry.

Error codes are equally useless

“Error code 4003” is meaningful to developers reading documentation. The LLM cannot look up error code tables. It needs natural language descriptions with actionable context.

The structured alternative

A good error response for a validation failure:

{
  "content": [{"type": "text", "text": "Validation error: 'date' field must be ISO 8601 (YYYY-MM-DD). Received: '15th of March, 2024'. Convert to '2024-03-15' and retry."}],
  "isError": true,
  "structuredContent": {
    "errorCategory": "validation",
    "isRetryable": true,
    "invalidField": "date",
    "expectedFormat": "YYYY-MM-DD",
    "receivedValue": "15th of March, 2024"
  }
}

The agent knows: what failed (date field), why (wrong format), exactly how to fix it (convert to ISO 8601), and that retrying with the fix will work (isRetryable: true). One retry cycle resolves the issue.

The impact on agent behavior

With generic errors: agent retries everything identically. Permission errors get retried futilely. Validation errors get retried with the same bad input. Business errors get retried when escalation is needed. 15% recovery rate.

With structured errors: agent retries transient, fixes validation inputs, explains business rules, escalates permissions. Each error type gets appropriate handling. 78-95% recovery rate.

The 30-second incident

Account suspended. Generic “Operation failed.” Agent retries 5 times, each returning the same error. 30 seconds of customer wait time. If the error had included errorCategory: "permission", isRetryable: false, the agent would have immediately told the customer about the suspension and offered to escalate — 2 seconds instead of 30.


One-liner: Generic “Operation failed” causes blind retrying (5 attempts on permanent errors, 30-second customer waits) — structured errors with errorCategory, isRetryable, field-level validation details, and suggested actions enable 78-95% recovery vs 15%.