S1.2.2 Task 1.2

Evaluate, Identify Gaps, Fill Them: The Iterative Refinement Loop

Accepting first-round results without evaluation is an anti-pattern. The coordinator’s job includes quality assessment: are all topics covered? Are findings specific enough? Do severity ratings align across agents? If not, delegate targeted follow-up — don’t restart, don’t accept.

The three-step loop

  1. Evaluate: assess sub-agent results against quality criteria (coverage, depth, consistency)
  2. Identify gaps: which topics are missing? Which findings are vague? Which ratings conflict?
  3. Fill gaps: delegate targeted follow-up queries for the specific gaps, then re-synthesize with expanded results

This is more efficient than restarting the entire pipeline (preserves valid first-round work) and more thorough than accepting incomplete results.

Quality criteria that matter

For a code review coordinator:

  • Coverage: are all changed files reviewed?
  • Depth: are findings specific (line numbers, code evidence) or vague?
  • Consistency: do severity ratings across files use compatible scales?

Evaluate in priority order (cheapest first). If coverage passes but depth fails on 3 files, only re-review those 3 files for depth — don’t re-review everything.

When to stop iterating

Production data from 200 queries:

  • Round 1: 65/100 quality
  • Round 2: 82/100 (+17)
  • Round 3: 88/100 (+6)
  • Round 4: 89/100 (+1)

Diminishing returns are clear. Stop when: quality target is met (≥85) OR improvement between rounds falls below a threshold (<2 points). This prevents both premature stopping (quality too low) and wasteful over-iteration (marginal gains at full cost).

Convergence guards

2% of queries in one system entered 10+ round loops, consuming 40% of the compute budget. The coordinator kept finding minor gaps and re-delegating, never satisfied.

Fix: convergence guards — a maximum iteration limit (e.g., 6 rounds) combined with an improvement threshold. When guards trigger, return the best available output with a coverage annotation noting remaining gaps. This protects the budget while maintaining quality for the 98% that converges normally.

Targeted follow-up, not full restart

When gaps are found, fill only the specific gaps. A full restart wastes all valid first-round work, costs 2x, and may introduce new different gaps. Targeted follow-up keeps the good work and adds what’s missing. Even for major gaps (50%+ missing), targeted follow-up is more efficient — you preserve the valid 50% and only research the missing half.

The first step

If your coordinator currently accepts first-round results without evaluation, the first step is adding the evaluation step itself. Before the coordinator can iterate, it needs the ability to assess quality against criteria. This is the foundation that enables the entire iterative loop.


One-liner: Evaluate first-round results against quality criteria, delegate targeted follow-up for specific gaps (not full restart), and stop when quality is met or improvement plateaus — with convergence guards to prevent runaway loops.