S1.5.2 Task 1.5

Select by Consequence AND Verifiability, Not by Compliance Rate

The selection principle for hooks vs prompts has three axes: consequence of failure, programmatic verifiability, and subjectivity. A 99% compliance rate on a data privacy rule still needs a hook (1% = data leaks). A 85% compliance rate on greeting by name is fine as a prompt (miss = minor UX hiccup).

The three-axis classification

  1. Consequence: does failure have legal, financial, or security impact? If yes → candidate for hook.
  2. Verifiability: can code deterministically check compliance? Regex, list matching, threshold comparison? If yes → hook is feasible.
  3. Subjectivity: does the requirement need the model’s judgment? “Be empathetic,” “present balanced viewpoints,” “use professional language”? If yes → prompt only, regardless of consequence.
RequirementConsequenceVerifiableSubjectiveMechanism
Block refunds >$500FinancialYes (amount check)NoHook
Sanctioned entity listLegalYes (string match)NoHook
Phone E.164 formatIntegrationYes (regex)NoHook
Balanced viewpointsQualityNoYesPrompt
Professional languageQualityNoYesPrompt
Greeting by first nameUXPartiallyNoPrompt

The “deliberate override” proof

11% non-compliance on discount tier limits. Breakdown: 5% incorrect tier identification (fixable bug), but 6% are DELIBERATE overrides — the model correctly identifies the tier, then reasons “this frustrated customer deserves extra consideration” and exceeds the limit.

This proves the fundamental limitation: prompt compliance is probabilistic because the model applies contextual judgment to override its instructions. Stronger wording (“MANDATORY”, “NO EXCEPTIONS”) provides marginal improvement because the model isn’t failing to understand — it’s choosing to deviate.

Select by consequence, not by rate

5 rules, all enforced by prompts:

RuleComplianceConsequenceShould be
Greet by name96%CosmeticPrompt
Verify identity91%Security (unauthorized access)Hook
Don’t share others’ data99%Legal (data privacy)Hook
Log modifications88%Compliance (audit trail)Hook
Offer survey93%UX preferencePrompt

Rule 3 at 99% still needs a hook — 1% data privacy failure is legally unacceptable. Rule 1 at 96% is fine as a prompt — 4% missed greetings have zero legal/financial impact.

Cost-benefit analysis for migration

When deciding whether to migrate a rule from prompt to hook, compare expected loss vs hook cost:

RuleComplianceMiss costExpected loss (1K interactions)Hook cost
Suggest upsell89%$0$0Not worth it
Verify shipping92%$15$120Marginal
Apply loyalty discount85%$8$120Marginal
Confirm email87%$500$6,500Worth it

Rule D’s $6,500/month expected loss far exceeds the ~$500/month hook development cost. Rules A-C don’t justify the overhead.

The over-hooking anti-pattern

After several incidents, a team implemented hooks for EVERYTHING: source verification, word count, formatting, citation style, paragraph length, vocabulary complexity. Result: 12 hook denials per report, 8 for style preferences. Production time 3x.

A 501-word section triggers a denial. Forced rewrites break natural flow. The system became too rigid.

Fix: keep hooks for compliance requirements (source verification, citation accuracy). Migrate style/quality preferences (word count, paragraph length) back to prompt guidance. The agent needs judgment flexibility for style, not rigid enforcement.

Context-aware enforcement

A coding standard (“all public API functions must have input validation”) is security-critical for customer-facing code but advisory for internal tools.

Context-aware hook: check the file path. Customer-facing directory (/src/api/public/) → block writes without validation (deterministic). Internal directory → allow with prompt reminder (advisory). The consequence level depends on context, so enforcement should too.

Defense in depth: prompt + hook together

A hook blocks production deploys. A colleague adds a prompt: “Never deploy to production without human approval.” Another colleague calls it redundant.

Both are valuable. The hook is the safety net (deterministic blocking). The prompt is behavioral guidance (reduces ATTEMPTS to deploy). Fewer blocked attempts → smoother UX, model proactively seeks approval instead of hitting the wall. Token overhead of one instruction is negligible.

When to diagnose before migrating

Test-before-commit compliance drops from 97% to 82%. Before implementing a hook, diagnose WHY:

  • If the model makes contextual judgments (“this small change doesn’t need tests”) → hook needed (fundamental prompt limitation)
  • If the instruction is lost in a crowded context window → prompt restructuring may suffice (simpler fix)

The correct first step is always understanding the failure mode, not jumping to the strongest enforcement.

Deterministic conversion vs contextual judgment

Dates and currencies → PostToolUse hooks (deterministic conversion rules exist). Measurements → prompt guidance (whether to convert “30-foot yacht market” to metric depends on context — a US market report may intentionally use imperial).

The dividing line: if conversion rules are objective and universal → hook. If conversion requires contextual judgment → prompt.


One-liner: Select enforcement by consequence (legal/financial → hook) + verifiability (code can check → hook feasible) + subjectivity (needs judgment → prompt only) — not by compliance rate. Watch for over-hooking (3x production time from style hooks) and use cost-benefit analysis for migration decisions.