Exam Weight 20% 25 Articles · 6 Tasks

Prompt Engineering & Optimization

Task 4.1 Specific Criteria & Rules (3)

K4.1.1

"Be Conservative" Means Nothing — 47% Agreement. "Lacks Sample Size" Means Something — 94%.

Specific criteria > vague instructions

arrow_forward

K4.1.2

One 60% False Positive Category Made Developers Ignore the 95% Accurate One

False positives poison trust

arrow_forward

K4.1.3

Labels Alone: 41%. Text Conditions: 72%. Text + Code Examples: 94%.

Severity definitions with examples

arrow_forward

Task 4.2 Few-shot Calibration (4)

K4.2.1

Text Instructions Failed at 64%. Two Examples Reached 91%.

Few-shot as ultimate calibration

arrow_forward

K4.2.2

Three Diverse Examples Beat Eight Homogeneous Ones

Few-shot design principles

arrow_forward

K4.2.3

Every Example Shows All Fields Populated — So the Model Fabricates Missing Ones

Few-shot reducing hallucination

arrow_forward

S4.2.1

Paired REPORT + SKIP Examples: 67% → 94% Boundary Accuracy

Few-shot for format consistency

arrow_forward

Task 4.3 Structured Output via tool_use (4)

K4.3.1

tool_use Eliminates Structural Errors. Semantic Errors Remain.

tool_use for guaranteed structure

arrow_forward

K4.3.2

Forced, Any, Auto — Three Modes, Three Guarantees, Three Failure Patterns

input_schema enforcement

arrow_forward

K4.3.3

Required Fields on Optional Data: The #1 Structural Cause of Hallucination

Nullable fields for optional data

arrow_forward

K4.3.4

Without Format Rules, the Same Date Comes Out Three Different Ways

Closed enum with "other" escape / Format standardization

arrow_forward

Task 4.4 Semantic Validation & Retry (4)

K4.4.1

Blind Retry: 12% Fixed After 3 Attempts. Error Feedback: 87% Fixed After 1.

Semantic validation beyond schema (retry with corrective feedback)

arrow_forward

K4.4.2

Retrying Absent Data Causes Hallucination — Format Errors Converge, Missing Data Diverges

Retry with corrective feedback (error classification)

arrow_forward

K4.4.3

Track Which Patterns Developers Dismiss — Then Add SKIP Examples

Confidence scoring for routing (feedback loop)

arrow_forward

K4.4.4

Schema Says Valid. Line Items Don't Sum to Total. Both Are True.

Cross-field consistency validation

arrow_forward

Task 4.5 Batch API (6)

K4.5.1

50% Cost Savings — But Up to 24 Hours Wait

Batch API for scale

arrow_forward

K4.5.2

Results Return in Arbitrary Order — custom_id Is Your Only Correlation

custom_id for result correlation

arrow_forward

K4.5.3

940 Succeeded, 45 Errored, 15 Expired — Resubmit Only the 60

Batch error handling

arrow_forward

K4.5.4

Poll, Download, Store Locally — Results Expire in 29 Days

Batch size and timeout

arrow_forward

K4.5.5

Different Errors Need Different Fixes — Don't Resubmit the Whole Batch

Cost optimization with batch

arrow_forward

S4.5.1

Always Test on a Sample Before Full Batch — 18% Failure vs 3%

Batch workflow design

arrow_forward

Task 4.6 Multi-pass Review (4)

K4.6.1

Same-Session Review: 0.3 Findings. Independent Instance: 3.7. Human Baseline: 4.1.

Multi-pass review architecture

arrow_forward

K4.6.2

Even Design Goals from the Generation Prompt Suppress Review Findings

Per-file + cross-file passes

arrow_forward

K4.6.3

Single-Pass Review at 13+ Files: 43% Detection. Multi-Pass: 86%.

Independent instances eliminate bias

arrow_forward

S4.6.1

Calibrate Confidence Per Category — "HIGH" Means 92% for Style but 70% for Security

Multi-instance review pipeline

arrow_forward