K2.3.1 Task 2.3

3 Tools = 97% Accuracy, 18 Tools = 51%: The Tool Count Curve

Tool selection accuracy degrades sharply with tool count. The data is concrete: 97% at 3 tools, 94% at 5, 82% at 8, 68% at 12, 51% at 18. The sweet spot is 4-5 tools per agent. For systems needing 15+ tools, distribute across specialized sub-agents.

The accuracy curve

Tools per agentSelection accuracy
397%
594%
882%
1268%
1851%

At 18 tools, half of tool calls go to the wrong tool. At 3-5, accuracy is near-perfect. This data drives every tool distribution decision.

Role-specific tool sets

Each agent gets only the tools its role requires:

  • Billing agent: [get_customer, get_payment_history, process_refund] (3 tools)
  • Security review agent: [Read, Grep, Glob] (3 tools, read-only)
  • Test runner: [Bash, Read, Grep] (3 tools)

“Just in case” tools are anti-patterns. 7 extra tools “in case they’re needed” push a 3-tool agent to 10, dropping accuracy from 97% to ~82%.

Too many tools cause role violations

A synthesis agent with 15 tools (including search and database tools) occasionally makes search queries during synthesis — adding latency and producing redundant data. It has search tools it shouldn’t have, and it uses them.

Fix: restrict to [format_output, generate_summary]. Without search tools, the synthesis agent works with existing findings instead of redundantly re-searching.

Similar: security agent uses Edit (should be read-only), test agent uses search tools (should only run tests), doc agent uses Bash (outside its role). All from tool over-assignment. Tool restriction enforces role boundaries deterministically — prompt instructions don’t.

Multi-agent as the scaling pattern

System needs 18 tools. One agent at 51% accuracy → split into 4 specialized agents with 4-5 tools each → each agent at 94%+ accuracy. The coordinator routes queries to the right agent (simple choice among 4), and each agent selects from its focused toolkit (high accuracy per agent).

Scaling: adding new tools

Current: 4 agents × 4 tools = 16 total. Need to add 5 new tools (21 total).

Wrong: add all 5 to one agent (9 tools → 82%). Also wrong: add to all agents (each goes 4→9).

Right: create a 5th specialized agent for the new tools, or distribute to existing agents IF each stays ≤5. The principle: no agent exceeds the 4-5 reliable range.

When 8 tools are needed for one role

An analysis agent needs 8 functions (security, performance, style, dependencies, license, coverage, complexity, documentation). 8 tools = 82% accuracy.

Fix: split into 2 focused sub-agents — analysis-security (security, dependencies, license, coverage = 4 tools) and analysis-quality (performance, style, complexity, documentation = 4 tools). Each at 94%+ accuracy. The coordinator handles the routing.

Scoped verification vs full search access

A synthesis agent occasionally needs to verify a fact. Wrong: give it all 5 search tools. Right: give it one scoped verify_fact tool for simple queries. Complex queries route back to the coordinator → search agent. This provides verification capability without the full search toolkit’s accuracy penalty.


One-liner: Tool selection accuracy follows a clear curve (97% at 3 tools, 51% at 18) — keep each agent at 4-5 tools, distribute larger sets across specialized sub-agents, and use scoped single-purpose tools instead of granting full search/write access to agents that occasionally need one capability.