Minute 0: "src/auth/jwt.ts:12 → verifyJWT()". Minute 45: "Typical JWT Validation Pattern." | Context Management & Reliability

An agent explores a 500-file codebase for 60 minutes. At the start, it references src/auth/jwt.ts:12 → verifyJWT() with the exact function signature. By minute 45, it describes “the typical JWT validation pattern” — generic, sourceless, interchangeable with any codebase. The specific details discovered early in the session have been diluted as subsequent information accumulated and displaced them.

This is context degradation: a progressive loss of specificity that correlates with session duration, not task complexity.

The Specificity Decline

An exploration agent’s output measured over 60 minutes:

Time	Specific references (file:line, function names)
0-15 min	92%
15-30 min	74%
30-45 min	48%
45-60 min	23%

The decline is steady and independent of what is being explored. It correlates with time (accumulated context), not with code characteristics.

Controlled Evidence: Complexity Is Not the Cause

Customer support accuracy tracked across conversation lengths, controlling for case complexity:

	Short (5-10 msgs)	Long (25+ msgs)
Simple cases	98%	91%
Complex cases	94%	82%

3% accuracy drop per 10 additional messages, regardless of complexity. The decline is uniform — it affects simple and complex cases equally. Error types: wrong dollar amounts, wrong policy references, misattributed order details. These are specificity losses from degraded context, not reasoning failures.

The Self-Contradiction Signal

After examining 80+ files, the agent starts contradicting its own earlier findings — suggesting refactors to code it previously identified as well-structured. The agent has lost access to its own prior assessment because that assessment has been diluted by subsequent information.

This is the clearest diagnostic signal: when the agent’s current output conflicts with its earlier output, context degradation is the cause.

Mitigation Strategies

For independent items (invoice extraction, batch processing): Process each item in its own session with fresh context. Invoice #47 has no dependency on invoice #46 — sharing a session only introduces degradation. Accuracy stays at 97% per item instead of dropping to 89% by item 50.

For continuous exploration (codebase mapping, research): Write key findings to a scratchpad file. When context fills, compact the conversation history — the agent references the external file instead of degraded context. This externalizes critical knowledge and makes it immune to degradation.

For large code reviews (20+ files in CI): Split into per-file review passes with independent context. Each file gets full attention. A final integration pass checks cross-file interactions. This is the K4.6.3 multi-pass pattern applied to degradation prevention.

For multi-module security audits (400+ files): Scan each module in an independent sub-agent session. Write findings to a file per module. A fresh coordinator compiles the report from all finding files. No single session processes more than one module’s worth of context.

What Does Not Work

Larger context windows. Delay degradation but do not prevent it. Attention quality degrades even within large windows when processing hundreds of files sequentially. The issue is attention dilution, not raw capacity.

“Be accurate” instructions. Cannot restore information already lost from context. If the order number was diluted during summarization, no instruction can make the agent reference it.

Re-reading earlier outputs. Consumes additional context space, accelerating the exhaustion problem. Information should be externalized to a file, not re-injected into a strained context.

Transferring to a larger model mid-session. Preserves the degraded state. Already-summarized early findings remain vague in the new context.

One-liner: Context degradation causes a steady 3% accuracy drop per 10 messages regardless of complexity — mitigate with scratchpads for continuous sessions, per-item sessions for independent tasks, and per-module sub-agents for large codebases.