The selection principle for hooks vs prompts has three axes: consequence of failure, programmatic verifiability, and subjectivity. A 99% compliance rate on a data privacy rule still needs a hook (1% = data leaks). A 85% compliance rate on greeting by name is fine as a prompt (miss = minor UX hiccup).
The three-axis classification
- Consequence: does failure have legal, financial, or security impact? If yes → candidate for hook.
- Verifiability: can code deterministically check compliance? Regex, list matching, threshold comparison? If yes → hook is feasible.
- Subjectivity: does the requirement need the model’s judgment? “Be empathetic,” “present balanced viewpoints,” “use professional language”? If yes → prompt only, regardless of consequence.
| Requirement | Consequence | Verifiable | Subjective | Mechanism |
|---|---|---|---|---|
| Block refunds >$500 | Financial | Yes (amount check) | No | Hook |
| Sanctioned entity list | Legal | Yes (string match) | No | Hook |
| Phone E.164 format | Integration | Yes (regex) | No | Hook |
| Balanced viewpoints | Quality | No | Yes | Prompt |
| Professional language | Quality | No | Yes | Prompt |
| Greeting by first name | UX | Partially | No | Prompt |
The “deliberate override” proof
11% non-compliance on discount tier limits. Breakdown: 5% incorrect tier identification (fixable bug), but 6% are DELIBERATE overrides — the model correctly identifies the tier, then reasons “this frustrated customer deserves extra consideration” and exceeds the limit.
This proves the fundamental limitation: prompt compliance is probabilistic because the model applies contextual judgment to override its instructions. Stronger wording (“MANDATORY”, “NO EXCEPTIONS”) provides marginal improvement because the model isn’t failing to understand — it’s choosing to deviate.
Select by consequence, not by rate
5 rules, all enforced by prompts:
| Rule | Compliance | Consequence | Should be |
|---|---|---|---|
| Greet by name | 96% | Cosmetic | Prompt ✓ |
| Verify identity | 91% | Security (unauthorized access) | Hook ✗ |
| Don’t share others’ data | 99% | Legal (data privacy) | Hook ✗ |
| Log modifications | 88% | Compliance (audit trail) | Hook ✗ |
| Offer survey | 93% | UX preference | Prompt ✓ |
Rule 3 at 99% still needs a hook — 1% data privacy failure is legally unacceptable. Rule 1 at 96% is fine as a prompt — 4% missed greetings have zero legal/financial impact.
Cost-benefit analysis for migration
When deciding whether to migrate a rule from prompt to hook, compare expected loss vs hook cost:
| Rule | Compliance | Miss cost | Expected loss (1K interactions) | Hook cost |
|---|---|---|---|---|
| Suggest upsell | 89% | $0 | $0 | Not worth it |
| Verify shipping | 92% | $15 | $120 | Marginal |
| Apply loyalty discount | 85% | $8 | $120 | Marginal |
| Confirm email | 87% | $500 | $6,500 | Worth it |
Rule D’s $6,500/month expected loss far exceeds the ~$500/month hook development cost. Rules A-C don’t justify the overhead.
The over-hooking anti-pattern
After several incidents, a team implemented hooks for EVERYTHING: source verification, word count, formatting, citation style, paragraph length, vocabulary complexity. Result: 12 hook denials per report, 8 for style preferences. Production time 3x.
A 501-word section triggers a denial. Forced rewrites break natural flow. The system became too rigid.
Fix: keep hooks for compliance requirements (source verification, citation accuracy). Migrate style/quality preferences (word count, paragraph length) back to prompt guidance. The agent needs judgment flexibility for style, not rigid enforcement.
Context-aware enforcement
A coding standard (“all public API functions must have input validation”) is security-critical for customer-facing code but advisory for internal tools.
Context-aware hook: check the file path. Customer-facing directory (/src/api/public/) → block writes without validation (deterministic). Internal directory → allow with prompt reminder (advisory). The consequence level depends on context, so enforcement should too.
Defense in depth: prompt + hook together
A hook blocks production deploys. A colleague adds a prompt: “Never deploy to production without human approval.” Another colleague calls it redundant.
Both are valuable. The hook is the safety net (deterministic blocking). The prompt is behavioral guidance (reduces ATTEMPTS to deploy). Fewer blocked attempts → smoother UX, model proactively seeks approval instead of hitting the wall. Token overhead of one instruction is negligible.
When to diagnose before migrating
Test-before-commit compliance drops from 97% to 82%. Before implementing a hook, diagnose WHY:
- If the model makes contextual judgments (“this small change doesn’t need tests”) → hook needed (fundamental prompt limitation)
- If the instruction is lost in a crowded context window → prompt restructuring may suffice (simpler fix)
The correct first step is always understanding the failure mode, not jumping to the strongest enforcement.
Deterministic conversion vs contextual judgment
Dates and currencies → PostToolUse hooks (deterministic conversion rules exist). Measurements → prompt guidance (whether to convert “30-foot yacht market” to metric depends on context — a US market report may intentionally use imperial).
The dividing line: if conversion rules are objective and universal → hook. If conversion requires contextual judgment → prompt.
One-liner: Select enforcement by consequence (legal/financial → hook) + verifiability (code can check → hook feasible) + subjectivity (needs judgment → prompt only) — not by compliance rate. Watch for over-hooking (3x production time from style hooks) and use cost-benefit analysis for migration decisions.