K2.1.2 Task 2.1

Data-Driven Description Fixing: Target the High-Misrouting Pairs

When two tools have similar descriptions, the model can’t reliably distinguish them — 40% misrouting between overlapping pairs. The fix: expand descriptions, target the specific confused pairs, and use a layered approach (descriptions for 88%, few-shot examples for the remaining 12% edge cases).

The misrouting data pattern

6 tools, 1,000 calls:

  • get_customerget_account: 25% mutual misrouting
  • search_orderssearch_transactions: 18% mutual misrouting
  • process_refund: 2% (acceptable)
  • update_address: 1% (acceptable)

Focus description improvement on the two high-misrouting pairs. process_refund and update_address already have clear, effective descriptions — don’t waste effort on them.

Expanding descriptions: the template

Each description should address three misrouting types:

  1. Overlap (tools confused with each other): differentiate purpose, input types, return formats
  2. Scope (tool called for tasks it can’t handle): boundary conditions with “NOT for” + redirect
  3. Ordering (tools called in wrong sequence): hints about when to use relative to other tools

Template: “Purpose: [what]. Input: [format]. Returns: [what]. Use for: [specific cases]. NOT for: [common confusion] — use [alternative] instead. Use after: [prerequisite tool] for enriched data.”

Names + descriptions together

Renaming alone is insufficient. A tool named csv_parser with description “Processes data” is still vague. The model primarily uses descriptions, not names, for selection. A good name provides a quick signal; a good description provides selection-critical detail. Both should be improved together.

The layered approach: descriptions + few-shot

After description expansion:

  • Before: 60% accuracy
  • After descriptions: 88% accuracy
  • Remaining 12%: legitimate edge cases where tools have genuine overlap

For the remaining 12%, add few-shot examples in the system prompt demonstrating correct tool selection for ambiguous queries. Descriptions define “what each tool does” (fixes 88%). Examples show “when tools overlap, how to decide” (fixes the remaining 12%).

Scalability: 15+ tools at high accuracy

15 tools with 70% accuracy. Research shows selection degrades with tool count.

Fix: expand descriptions AND distribute tools across specialized sub-agents. Each sub-agent has 4-5 tools relevant to its role. The coordinator selects the right sub-agent (fewer choices, easier), then the sub-agent selects from its focused toolkit (high accuracy per agent). Total tools remain 15+ but no single agent faces all 15 choices.

When to split vs when to describe

If two tools genuinely do the same thing with minor variations, consider consolidating into one tool with a parameter. Fewer tools with clear purposes route better than many tools with overlapping descriptions.

If two tools serve different purposes that vague descriptions don’t reveal, expand descriptions to differentiate. The tools are distinct — the descriptions just need to make that clear.


One-liner: Target description fixes at the specific high-misrouting pairs (data-driven), use the template (purpose, I/O, “NOT for” + redirect, ordering hints), and layer descriptions (88% fix) with few-shot examples (12% edge cases) — for 15+ tools, distribute across sub-agents with 4-5 focused tools each.