Source A says EV market share is 35%. Source B says 42%. The synthesis agent flags this as a contradiction. A human reviews it and finds: Source A is from a 2022 industry consortium report. Source B is from 2024 government statistics. It is not a contradiction — it is a 7-point growth over two years.
Without dates, 35% and 42% look like conflicting claims. With dates, they tell a story of market growth. The difference between “contradiction” and “trend” is a single piece of metadata.
The False Contradiction Problem
One research system flagged 18 contradictions per report. Human review revealed 14 of those (78%) were false — the same metric measured at different time points showing natural evolution. Only 4 were genuine same-period conflicts.
The root cause: sub-agent output included claims and source URLs but no dates. The synthesis agent had no way to distinguish “different values because different years” from “different values because different sources disagree.”
After adding data_collection_date and publication_date to the sub-agent output schema:
| Metric | Before dates | After dates |
|---|---|---|
| Contradictions flagged per report | 18 | 5 |
| False positive rate | 78% | 20% |
| Temporal trends correctly presented | 0 | Yes (as progressions) |
| Synthesis time | Baseline | -15% |
The 15% synthesis time reduction came from eliminating the work of resolving false contradictions. The primary benefit was accuracy — temporal trends now appeared as progressions (“35% → 42%”) instead of errors.
What to Include
Two date fields in every sub-agent output:
- data_collection_date — when the underlying data was gathered (the measurement date)
- publication_date — when the source was published or last updated
Both matter. A 2024 report might contain data collected in 2022. The publication date tells you the source’s recency; the data collection date tells you the measurement’s recency.
{
"claim": "EV market share: 42%",
"source": "Government Statistics Bureau Annual Report",
"data_collection_date": "2024-Q3",
"publication_date": "2024-12-15"
}
This is not complex to implement. Sub-agents already read the source documents — extracting dates from them is straightforward. The schema change is adding two fields to the output structure.
Three Scenarios the Synthesis Must Handle
With temporal metadata, the synthesis agent can distinguish three scenarios:
1. Temporal Trend (Same Metric, Different Periods)
Market share: 35% (2022) → 38% (2023) → 42% (2024)
Source: Industry consortium annual reports
Present as a progression. Not a contradiction.
2. Genuine Contradiction (Same Metric, Same Period)
2024 market share:
- Source A: 42% (government statistics)
- Source B: 38% (industry consortium)
Note: Different methodologies; values conflict for the same period.
Flag as a real conflict. Preserve both with sources.
3. Only Older Data Available
Customer satisfaction: 78% (most recent: 2021 survey)
Note: No newer measurement available.
Present the data with a clear note on its age. A 2021 measurement is more informative than no measurement, as long as the reader knows the date.
Without temporal metadata, all three scenarios look the same: “two different numbers for the same metric.” The synthesis agent cannot tell them apart.
The “Just Use the Latest” Trap
A common proposal: “Only return the most recent data point per metric. This eliminates contradictions.”
It also eliminates trend analysis. “42% market share” is a snapshot. “35% in 2022, 38% in 2023, 42% in 2024” is a story of consistent growth. Research on growth rates, cyclical patterns, and historical context requires multiple time-period data points. Discarding older values blinds the analysis.
It also fails for metrics where only older data exists. If customer satisfaction was last measured in 2021, the “latest only” rule either returns the 2021 value (defeating its own purpose) or returns nothing (losing useful data).
Temporal metadata solves the false contradiction problem without losing any data. The synthesis agent gets everything it needs to present trends correctly, flag genuine conflicts, and note data freshness.
Implicit Dates Are Not Enough
An alternative: “Train the synthesis agent to infer dates from phrases like ‘recent study’ or ‘latest data.’” These phrases are ambiguous. “Recent” could be 6 months or 5 years. “Latest” could mean last week or last decade. Explicit dates from the source document — the actual year printed on the report — are precise and unambiguous.
The same issue applies to inferring dates from source URLs or file names: report-2024.pdf might contain data collected in 2023 or even 2020. Only the dates extracted from the document content itself are reliable.
One-liner: Include data collection and publication dates with every claim — without temporal metadata, the synthesis agent cannot tell a two-year market growth trend from a factual contradiction.