Write the tests before asking Claude to implement. Test failures give Claude specific, actionable feedback. Without tests, Claude iterates without clear pass/fail signals, wasting cycles and missing edge cases.
The Data
A citation extractor was built using two approaches on the same 15-test suite:
| Approach | Iterations | Tests passed | Time |
|---|---|---|---|
| Tests first | 3 | 15/15 | 45 min |
| Implement first, tests after | 7 | 12/15 | 90 min |
Tests-first converged in fewer iterations because Claude had concrete targets from the start. The implement-first approach required 7 unfocused iterations and still failed 3 edge cases — the same tests that tests-first handled on iteration 2.
The Workflow
- Write tests covering expected behavior, edge cases, and performance requirements
- Share tests with Claude and ask it to implement
- Run tests — share complete output with Claude
- Claude iterates — fixes failures while maintaining passing tests
- Repeat until all tests pass
The tests define what the implementation should do. Claude determines how. Even if Claude changes its implementation approach entirely between iterations, the tests remain valid because they describe requirements, not solutions.
Feedback Quality Determines Convergence
Two developers built the same ticket classifier against the same 20-test suite:
| Developer | Feedback shared | Iterations | Tests passed |
|---|---|---|---|
| Dev A | Full test output each round | 3 | 20/20 |
| Dev B | ”X tests failed” count only | 8 | 17/20 |
Dev A shared actual vs expected values, stack traces, and assertion messages. Claude could diagnose root causes precisely.
Dev B shared only “3 tests failed.” Claude had to guess what went wrong. The same 3 tests failed from iteration 4 through 8 — a systematic pattern, not bad luck. Without specific failure details, Claude keeps making the same mistakes.
Always share complete test output, not just pass/fail counts.
Preventing Regressions: Share Everything Each Round
A common mistake: sharing only the latest failure with Claude. Claude fixes that case but breaks 3 previously passing tests because it never saw the full picture.
Every iteration should include the complete test suite results — all passes and all failures. This prevents Claude from optimizing for the latest test at the expense of earlier ones. The full results act as a regression guard within the iteration loop.
What the Test Suite Should Cover
For a code complexity analyzer measuring cyclomatic complexity, nesting depth, and function length:
- Correctness tests — Expected metric values on known code samples
- Edge cases — Empty functions, deeply nested code, one-line functions
- Performance benchmarks — Must handle 10K-line files within time limits
- Output format assertions — Required JSON structure
All three dimensions — correctness, performance, and format — should be testable from the start. Omitting performance tests means Claude may produce a correct but slow implementation. After 3 iterations, when correctness passes but performance fails at 12 seconds (target: 5 seconds), share both results and ask Claude to optimize while keeping correctness. Do not relax the benchmark — Claude should meet the target.
Anti-Patterns
Tests after implementation. Writing tests that validate whatever Claude built locks in potentially incorrect behavior. Tests should define expected behavior upfront, not rubber-stamp Claude’s interpretation.
Manual spot-checking instead of tests. Manual verification is inconsistent, non-repeatable, and cannot detect regressions across iterations. A test suite provides the same feedback reliably every round.
Letting Claude write both implementation and tests. Tests encode your requirements. If Claude writes them, they encode Claude’s interpretation — which is what you are trying to verify, not what you are trying to specify.
Incremental test additions only. Starting with 1 test and adding per iteration causes regression cascades. Upfront coverage of standard cases and key edge cases prevents this.
Expanding the Suite During Iteration
Adding new test cases when Claude’s implementation reveals previously unconsidered edge cases is healthy iteration. The suite grows to capture new patterns. This is different from the anti-pattern of starting with too few tests — it extends a comprehensive foundation rather than building from a minimal one.
One-liner: Write tests first, share complete test output each round, and let test failures drive iteration — this converges in 3 rounds instead of 7 and catches edge cases that implement-first misses.