S5.3.1 Task 5.3

"AI Has Minimal Impact on Performing Arts" — Actually, the Search Just Timed Out

A research synthesis reports “AI has minimal impact on the performing arts sector.” A reviewer discovers the web search sub-agent timed out for all performing arts queries — no data was actually collected. The synthesis model interpreted the absence of data as absence of impact. Without coverage annotations marking performing arts as “unavailable — search timed out,” the model could not distinguish “no evidence found” from “evidence collection failed.”

The Three Coverage Categories

CategoryMeaningAnnotation
well_supportedMultiple sources, full dataArts (3 sources), Music (4 sources)
limited_dataFew sources, partial coverageDance (1 source)
unavailableSearch failed, no data collectedTheater (timeout — 0 sources)

Each category tells the decision-maker something different:

  • Well-supported: Act on these findings with confidence
  • Limited: Consider with caution, may need supplementary research
  • Unavailable: Do not draw conclusions — data collection failed, not evidence absence

The Decision Quality Data

MetricWithout annotationsWith annotations
Decisions based on unrecognized gaps35%4%
Decision revision rate (post-gap discovery)22%3%
Decision-maker confidence in research65%89%

Coverage annotations reduce unrecognized data gaps by 89%. Decision-makers trust the research more — not less — when they can see its completeness. Transparency builds confidence; hidden gaps destroy it when discovered later.

The Pipeline: Sub-Agent Metadata → Aggregation → Output

Step 1: Sub-agents report coverage metadata. Each sub-agent includes structured metadata alongside results: which queries succeeded, which failed, which produced partial results. This is the only reliable source — the synthesis agent cannot infer coverage from data volume alone.

Step 2: Synthesis aggregates per-topic coverage. The synthesis agent categorizes each topic based on sub-agent metadata: well_supported, limited_data, or unavailable. Multiple successful sub-agents for a topic → well_supported. One source → limited. All timed out → unavailable.

Step 3: Coverage section in final output. The synthesis prominently includes the coverage assessment. Not a generic disclaimer (“some sources may have been unavailable”) but specific, per-topic annotations with reasons for each gap.

Why Silent Omission Is Worse Than Annotation

A product manager argues: “Coverage annotations clutter the output. Just omit topics with limited data — cleaner output builds trust.”

For stakeholders making investment decisions, silent omission is deceptive. If a topic was researched but data was unavailable, omitting it makes the stakeholder think it was never considered. When they discover the gap, trust in the entire report collapses — far worse than the “clutter” of an honest annotation.

Do Not Infer Coverage — Require Explicit Reporting

Inferring coverage from data volume or consistency is unreliable. A timeout returning zero results looks identical to a successful search that found nothing. Only explicit sub-agent reporting (query_status: timeout vs query_status: success, results: 0) can distinguish these cases.

Training the synthesis model to “generate appropriate coverage sections” without metadata means it would fabricate coverage assessments. Coverage annotations must be grounded in actual sub-agent success/failure data.


One-liner: Include per-topic coverage annotations (well_supported/limited_data/unavailable) in every synthesis — this prevents the model from interpreting search failures as “no evidence exists” and reduces unrecognized decision gaps from 35% to 4%.