Tool selection accuracy degrades sharply with tool count. The data is concrete: 97% at 3 tools, 94% at 5, 82% at 8, 68% at 12, 51% at 18. The sweet spot is 4-5 tools per agent. For systems needing 15+ tools, distribute across specialized sub-agents.
The accuracy curve
| Tools per agent | Selection accuracy |
|---|---|
| 3 | 97% |
| 5 | 94% |
| 8 | 82% |
| 12 | 68% |
| 18 | 51% |
At 18 tools, half of tool calls go to the wrong tool. At 3-5, accuracy is near-perfect. This data drives every tool distribution decision.
Role-specific tool sets
Each agent gets only the tools its role requires:
- Billing agent:
[get_customer, get_payment_history, process_refund](3 tools) - Security review agent:
[Read, Grep, Glob](3 tools, read-only) - Test runner:
[Bash, Read, Grep](3 tools)
“Just in case” tools are anti-patterns. 7 extra tools “in case they’re needed” push a 3-tool agent to 10, dropping accuracy from 97% to ~82%.
Too many tools cause role violations
A synthesis agent with 15 tools (including search and database tools) occasionally makes search queries during synthesis — adding latency and producing redundant data. It has search tools it shouldn’t have, and it uses them.
Fix: restrict to [format_output, generate_summary]. Without search tools, the synthesis agent works with existing findings instead of redundantly re-searching.
Similar: security agent uses Edit (should be read-only), test agent uses search tools (should only run tests), doc agent uses Bash (outside its role). All from tool over-assignment. Tool restriction enforces role boundaries deterministically — prompt instructions don’t.
Multi-agent as the scaling pattern
System needs 18 tools. One agent at 51% accuracy → split into 4 specialized agents with 4-5 tools each → each agent at 94%+ accuracy. The coordinator routes queries to the right agent (simple choice among 4), and each agent selects from its focused toolkit (high accuracy per agent).
Scaling: adding new tools
Current: 4 agents × 4 tools = 16 total. Need to add 5 new tools (21 total).
Wrong: add all 5 to one agent (9 tools → 82%). Also wrong: add to all agents (each goes 4→9).
Right: create a 5th specialized agent for the new tools, or distribute to existing agents IF each stays ≤5. The principle: no agent exceeds the 4-5 reliable range.
When 8 tools are needed for one role
An analysis agent needs 8 functions (security, performance, style, dependencies, license, coverage, complexity, documentation). 8 tools = 82% accuracy.
Fix: split into 2 focused sub-agents — analysis-security (security, dependencies, license, coverage = 4 tools) and analysis-quality (performance, style, complexity, documentation = 4 tools). Each at 94%+ accuracy. The coordinator handles the routing.
Scoped verification vs full search access
A synthesis agent occasionally needs to verify a fact. Wrong: give it all 5 search tools. Right: give it one scoped verify_fact tool for simple queries. Complex queries route back to the coordinator → search agent. This provides verification capability without the full search toolkit’s accuracy penalty.
One-liner: Tool selection accuracy follows a clear curve (97% at 3 tools, 51% at 18) — keep each agent at 4-5 tools, distribute larger sets across specialized sub-agents, and use scoped single-purpose tools instead of granting full search/write access to agents that occasionally need one capability.