Context Management & Reliability
"$127.50 Refund" Became "Customer Requested a Refund" — The Agent Processed $50
Progressive summarization & Case Facts
Sources 1-2: 96%. Sources 5-6: 52%. Sources 9-10: 94%. The U-Shaped Attention Curve.
Lost-in-the-middle effect
40 Fields Returned, 5 Needed — Context Full After 4 Calls Instead of 11
Tool output bloat and filtering
The API Does Not Remember Your Conversation — You Must Send It Every Time
Stateless API & history management
Turns 1-10: 96% Accuracy. Turns 31+: 58%. No Persistent Facts Block.
Case Facts & issue tracker pattern
28% of Issues Dropped Without a Tracker. 3% With One.
Multi-issue tracking
2,800 Tokens of Reasoning Chain → 280 Tokens of Structured Facts. Same Findings, 10x Less Context.
Sub-agent structured output efficiency
Sentiment-Based Escalation: 40% Volume, 30% Needed Human. Replace It.
Escalation trigger design
"I Want a Person" → Attempt Resolution First → CSAT 2.1. Immediate Escalation → 3.8.
Immediate escalation on human request
High Confidence (0.9+): 12% Errors. Low Confidence (<0.5): 68% Correct. The Signal Is Broken.
Confidence scores unreliable
Auto-Select "Most Recent": 27% Wrong Customer. Ask for Email: 2%.
Multi-match disambiguation
Policy Silent on Competitor Matching → Agent Decides → 52% Approve, 48% Deny. Same Request.
Policy gap detection & escalation
Generic "Failed" → 18% Recovery. Structured Error → 71%.
Structured error context for recovery
"Database Error" — Orchestrator Retried 5 Times. The Database Was Permanently Decommissioned.
Generic error status anti-pattern
Two Anti-Patterns That Compound: 35% of Reports Have Hidden Gaps, 25% of Queries Are Killed
Silent swallow + termination anti-patterns
"No Peer-Reviewed Studies Exist" — Actually, 47 Papers Were Found After the Outage Ended
Access failure vs valid empty
"AI Has Minimal Impact on Performing Arts" — Actually, the Search Just Timed Out
Coverage annotation in synthesis
Minute 0: "src/auth/jwt.ts:12 → verifyJWT()". Minute 45: "Typical JWT Validation Pattern."
Context degradation over time
The Scratchpad: Writing Findings Down Before the Context Forgets Them
Scratchpad for persistent findings
Sub-Agent Delegation: Let Someone Else Hold the Data
Sub-agent delegation for context isolation
Crash Recovery: A Manifest File So You Don't Start Over
Crash recovery with manifest
Save, Compact, Restore: The Three-Step Workflow for /compact
/compact save → compact → restore
96% Accuracy, 40% More Escalations: Why Aggregate Metrics Lie
Aggregate accuracy masks failures
Stratified Sampling: Why Random Monitoring Misses the Problems That Matter
Stratified sampling for monitoring
Field-Level Confidence: Review the Uncertain Fields, Not the Entire Document
Field-level confidence routing
Confidence Calibration: The Model Says 0.9 — What Does That Actually Mean?
Confidence threshold calibration
Claim-Source Mapping: Every Fact Needs a Return Address
Claim-source mapping provenance
Conflicting Data: Present Both, Fabricate Neither
Conflicting data: preserve both
Temporal Metadata: Not Every Difference Is a Contradiction
Temporal metadata prevents false contradictions
Established vs Contested: Structure Reports by Evidence Strength
Well-established vs contested structure
Content-Type-to-Format Matching: Tables for Numbers, Prose for Analysis
Content-type-to-format matching