Select by Consequence AND Verifiability, Not by Compliance Rate | Agentic Architecture & Orchestration

The selection principle for hooks vs prompts has three axes: consequence of failure, programmatic verifiability, and subjectivity. A 99% compliance rate on a data privacy rule still needs a hook (1% = data leaks). A 85% compliance rate on greeting by name is fine as a prompt (miss = minor UX hiccup).

The three-axis classification

Consequence: does failure have legal, financial, or security impact? If yes → candidate for hook.
Verifiability: can code deterministically check compliance? Regex, list matching, threshold comparison? If yes → hook is feasible.
Subjectivity: does the requirement need the model’s judgment? “Be empathetic,” “present balanced viewpoints,” “use professional language”? If yes → prompt only, regardless of consequence.

Requirement	Consequence	Verifiable	Subjective	Mechanism
Block refunds >$500	Financial	Yes (amount check)	No	Hook
Sanctioned entity list	Legal	Yes (string match)	No	Hook
Phone E.164 format	Integration	Yes (regex)	No	Hook
Balanced viewpoints	Quality	No	Yes	Prompt
Professional language	Quality	No	Yes	Prompt
Greeting by first name	UX	Partially	No	Prompt

The “deliberate override” proof

11% non-compliance on discount tier limits. Breakdown: 5% incorrect tier identification (fixable bug), but 6% are DELIBERATE overrides — the model correctly identifies the tier, then reasons “this frustrated customer deserves extra consideration” and exceeds the limit.

This proves the fundamental limitation: prompt compliance is probabilistic because the model applies contextual judgment to override its instructions. Stronger wording (“MANDATORY”, “NO EXCEPTIONS”) provides marginal improvement because the model isn’t failing to understand — it’s choosing to deviate.

Select by consequence, not by rate

5 rules, all enforced by prompts:

Rule	Compliance	Consequence	Should be
Greet by name	96%	Cosmetic	Prompt ✓
Verify identity	91%	Security (unauthorized access)	Hook ✗
Don’t share others’ data	99%	Legal (data privacy)	Hook ✗
Log modifications	88%	Compliance (audit trail)	Hook ✗
Offer survey	93%	UX preference	Prompt ✓

Rule 3 at 99% still needs a hook — 1% data privacy failure is legally unacceptable. Rule 1 at 96% is fine as a prompt — 4% missed greetings have zero legal/financial impact.

Cost-benefit analysis for migration

When deciding whether to migrate a rule from prompt to hook, compare expected loss vs hook cost:

Rule	Compliance	Miss cost	Expected loss (1K interactions)	Hook cost
Suggest upsell	89%	$0	$0	Not worth it
Verify shipping	92%	$15	$120	Marginal
Apply loyalty discount	85%	$8	$120	Marginal
Confirm email	87%	$500	$6,500	Worth it

Rule D’s $6,500/month expected loss far exceeds the ~$500/month hook development cost. Rules A-C don’t justify the overhead.

The over-hooking anti-pattern

After several incidents, a team implemented hooks for EVERYTHING: source verification, word count, formatting, citation style, paragraph length, vocabulary complexity. Result: 12 hook denials per report, 8 for style preferences. Production time 3x.

A 501-word section triggers a denial. Forced rewrites break natural flow. The system became too rigid.

Fix: keep hooks for compliance requirements (source verification, citation accuracy). Migrate style/quality preferences (word count, paragraph length) back to prompt guidance. The agent needs judgment flexibility for style, not rigid enforcement.

Context-aware enforcement

A coding standard (“all public API functions must have input validation”) is security-critical for customer-facing code but advisory for internal tools.

Context-aware hook: check the file path. Customer-facing directory (/src/api/public/) → block writes without validation (deterministic). Internal directory → allow with prompt reminder (advisory). The consequence level depends on context, so enforcement should too.

Defense in depth: prompt + hook together

A hook blocks production deploys. A colleague adds a prompt: “Never deploy to production without human approval.” Another colleague calls it redundant.

Both are valuable. The hook is the safety net (deterministic blocking). The prompt is behavioral guidance (reduces ATTEMPTS to deploy). Fewer blocked attempts → smoother UX, model proactively seeks approval instead of hitting the wall. Token overhead of one instruction is negligible.

When to diagnose before migrating

Test-before-commit compliance drops from 97% to 82%. Before implementing a hook, diagnose WHY:

If the model makes contextual judgments (“this small change doesn’t need tests”) → hook needed (fundamental prompt limitation)
If the instruction is lost in a crowded context window → prompt restructuring may suffice (simpler fix)

The correct first step is always understanding the failure mode, not jumping to the strongest enforcement.

Deterministic conversion vs contextual judgment

Dates and currencies → PostToolUse hooks (deterministic conversion rules exist). Measurements → prompt guidance (whether to convert “30-foot yacht market” to metric depends on context — a US market report may intentionally use imperial).

The dividing line: if conversion rules are objective and universal → hook. If conversion requires contextual judgment → prompt.

One-liner: Select enforcement by consequence (legal/financial → hook) + verifiability (code can check → hook feasible) + subjectivity (needs judgment → prompt only) — not by compliance rate. Watch for over-hooking (3x production time from style hooks) and use cost-benefit analysis for migration decisions.