K2.1.1 Task 2.1

30% Misrouting with Minimal Descriptions, Near-Zero with Detailed Ones

The model selects tools based on their descriptions. Minimal descriptions (“Extracts data from invoice documents”) cause 30% misrouting. Detailed descriptions with purpose, input, output, examples, and boundary conditions produce 95%+ accuracy. The description is the interface between the model and the tool — treat it as the most important field.

Five elements of an effective description

  1. Purpose: what the tool does
  2. Input format: what data it expects and in what form
  3. Output format: what it returns
  4. Example use cases: when to use it
  5. Boundary conditions: what NOT to use it for, and which tool to use instead

Example: “Searches file CONTENTS for patterns like function names, error messages, or import statements. Input: regex pattern. Use for: finding all callers of a function, locating error messages, tracing imports. Do NOT use for finding files by name — use Glob instead.”

The “Do NOT use for” line directly prevents the most common confusion.

Minimal descriptions cause misrouting

Three extraction tools with descriptions like “Extracts data from [type] documents” differ by one word. Production data: 30% of contracts misrouted to the invoice tool, 20% of reports misrouted to the contract tool. When the correct tool is selected, accuracy is 95%+ — the problem is selection, not execution.

Fix: expand descriptions with document characteristics. An invoice has line items, quantities, totals, payment terms. A contract has parties, obligations, terms, signatures. The model needs these signals to distinguish them.

Vague descriptions ≠ “the model should figure it out”

Generic names (“Search tool A” / “Search tool B”) give zero information. Unix command names (“grep” / “find”) rely on implicit knowledge. “Searches the codebase” vs “Finds files” don’t clearly distinguish content search from path matching. The model needs explicit, unambiguous descriptions.

Tool splitting: one tool serving multiple operations

A generic analyze_document tool handles data extraction, summarization, AND claim verification. 35% parameter error rate because the model can’t determine which operation mode to use.

Fix: split into three purpose-specific tools (extract_data_points, summarize_content, verify_claim), each with a focused description and tailored input schema. The model doesn’t need to figure out operation modes — each tool has one clear purpose. Error rate drops to near-zero.

Split when: one tool serves multiple distinct operations with different parameter requirements. The model struggles with mode-switching within a single tool.

The description quality → accuracy pipeline

Tool descriptions are not documentation for humans. They’re the primary input the model uses to decide which tool fits the current task. Data from a 6-tool system:

  • Tools with 40-50 word descriptions (purpose, I/O, disambiguation): 93-96% accuracy
  • Tools with 5-8 word descriptions (“Runs analysis”): 52-61% accuracy

Every word in a tool description earns its keep by improving selection reliability.


One-liner: Detailed descriptions (purpose, I/O, examples, “NOT for” boundaries) produce 93-96% selection accuracy vs 52-61% for minimal ones — and generic multi-purpose tools should be split into focused specialists when parameter error rates exceed 10%.