Don't Run Every Agent for Every Query | Agentic Architecture & Orchestration

The coordinator’s intelligence should be used to match query complexity to delegation scope. Running all agents for every request is a fixed pipeline, not an intelligent orchestrator. Dynamic selection — analyzing each query and delegating only to the agents that are needed — reduces latency, cuts costs, and preserves thoroughness for complex cases.

The fixed pipeline anti-pattern

A research coordinator with 5 sub-agents processes every query through all 5, even “When was paper X published?” — a question that only needs the search agent (~5s). Result: simple factual queries take 45 seconds, the same as complex research queries. Users complain about slow responses for simple questions.

The root cause: the coordinator uses a fixed pipeline regardless of query complexity. Equal latency for simple and complex queries is the telltale sign.

The cost impact

Production data from a 500-query/month system:

Fixed pipeline: $0.10/query × 500 = $50/month
Dynamic selection: simple ($0.02) + moderate ($0.05) + complex ($0.10) = $23.25/month
Savings: 53.5% — driven by routing 45% of queries to a single agent

The coordinator’s analysis step costs far less than invoking unnecessary sub-agents. It’s an investment that pays for itself many times over.

Adaptive delegation

What happens when the initial assessment underestimates complexity? A developer asks to “check my utility function” — looks like style-only, but the function contains a concurrency bug.

The solution: two-phase delegation. Start with the initially-selected agents. If first-phase results signal additional concerns (style agent flags a suspicious pattern), the coordinator escalates by invoking additional agents. This preserves fast-path efficiency while catching cases that need deeper analysis.

Improving selection accuracy

After implementing dynamic selection, one system saw: 85% correct, 10% under-selection (incomplete answers), 5% over-selection (minor waste). The 10% under-selection drove customer complaints.

The fix: analyze the under-selection cases, identify patterns (e.g., multi-concern requests that look single-concern), and add few-shot examples to the coordinator’s prompt demonstrating correct selection for these commonly-misjudged patterns. The approach works for 85%+ of cases — calibrate the edge cases, don’t abandon the approach.

The coordinator does the selection

A separate classifier model for agent selection is unnecessary. The coordinator already analyzes queries for task decomposition — dynamic agent selection is a natural extension of this analysis. Adding a classifier duplicates work, requires training data, and introduces misclassification risk.

One-liner: Analyze each query and delegate only to needed agents — dynamic selection saves 53%+ in costs, reduces simple-query latency from 45s to ~10s, and can adapt with two-phase delegation when initial assessment underestimates complexity.