A research synthesis reports “AI has minimal impact on the performing arts sector.” A reviewer discovers the web search sub-agent timed out for all performing arts queries — no data was actually collected. The synthesis model interpreted the absence of data as absence of impact. Without coverage annotations marking performing arts as “unavailable — search timed out,” the model could not distinguish “no evidence found” from “evidence collection failed.”
The Three Coverage Categories
| Category | Meaning | Annotation |
|---|---|---|
well_supported | Multiple sources, full data | Arts (3 sources), Music (4 sources) |
limited_data | Few sources, partial coverage | Dance (1 source) |
unavailable | Search failed, no data collected | Theater (timeout — 0 sources) |
Each category tells the decision-maker something different:
- Well-supported: Act on these findings with confidence
- Limited: Consider with caution, may need supplementary research
- Unavailable: Do not draw conclusions — data collection failed, not evidence absence
The Decision Quality Data
| Metric | Without annotations | With annotations |
|---|---|---|
| Decisions based on unrecognized gaps | 35% | 4% |
| Decision revision rate (post-gap discovery) | 22% | 3% |
| Decision-maker confidence in research | 65% | 89% |
Coverage annotations reduce unrecognized data gaps by 89%. Decision-makers trust the research more — not less — when they can see its completeness. Transparency builds confidence; hidden gaps destroy it when discovered later.
The Pipeline: Sub-Agent Metadata → Aggregation → Output
Step 1: Sub-agents report coverage metadata. Each sub-agent includes structured metadata alongside results: which queries succeeded, which failed, which produced partial results. This is the only reliable source — the synthesis agent cannot infer coverage from data volume alone.
Step 2: Synthesis aggregates per-topic coverage. The synthesis agent categorizes each topic based on sub-agent metadata: well_supported, limited_data, or unavailable. Multiple successful sub-agents for a topic → well_supported. One source → limited. All timed out → unavailable.
Step 3: Coverage section in final output. The synthesis prominently includes the coverage assessment. Not a generic disclaimer (“some sources may have been unavailable”) but specific, per-topic annotations with reasons for each gap.
Why Silent Omission Is Worse Than Annotation
A product manager argues: “Coverage annotations clutter the output. Just omit topics with limited data — cleaner output builds trust.”
For stakeholders making investment decisions, silent omission is deceptive. If a topic was researched but data was unavailable, omitting it makes the stakeholder think it was never considered. When they discover the gap, trust in the entire report collapses — far worse than the “clutter” of an honest annotation.
Do Not Infer Coverage — Require Explicit Reporting
Inferring coverage from data volume or consistency is unreliable. A timeout returning zero results looks identical to a successful search that found nothing. Only explicit sub-agent reporting (query_status: timeout vs query_status: success, results: 0) can distinguish these cases.
Training the synthesis model to “generate appropriate coverage sections” without metadata means it would fabricate coverage assessments. Coverage annotations must be grounded in actual sub-agent success/failure data.
One-liner: Include per-topic coverage annotations (well_supported/limited_data/unavailable) in every synthesis — this prevents the model from interpreting search failures as “no evidence exists” and reduces unrecognized decision gaps from 35% to 4%.