Prompt Engineering Mastery · Lesson 16 of 24
Interview: Design a Structured Output Prompt
Q: How do you reliably get JSON from an LLM?
Layered approach:
- Prompt-level: specify schema in TypeScript-style types, say "respond ONLY with valid JSON," prefill the assistant turn with
{to force JSON generation. - API-level: use JSON mode (OpenAI) for syntactic validity, or Structured Outputs / tool use for schema enforcement — the schema is applied at sampling time, not just described.
- Parsing: wrap
json.loads()in error handling; strip markdown code blocks before parsing; try regex extraction as fallback. - Validation: validate parsed JSON against a Pydantic model or JSON Schema.
- Retry: on failure, append the specific error to the conversation and re-prompt; cap at 2-3 retries.
Q: What is the difference between JSON mode and Structured Outputs?
JSON mode (OpenAI): guarantees the output is syntactically valid JSON. Does not guarantee schema compliance — the model may return different field names or types. The schema is only described to the model; not enforced at sampling.
Structured Outputs (OpenAI, via response_format with Pydantic schema): enforces the schema at the constrained decoding level — invalid tokens are masked during sampling. The model physically cannot produce an output that doesn't conform to the schema. Pydantic model is returned directly without parsing.
Tool use / function calling (OpenAI + Anthropic): similar to Structured Outputs — schema enforced at sampling. The model generates arguments that conform to the tool's input schema.
Q: How do you handle truncated JSON output?
Prevention is better than cure: estimate the output size and set max_tokens conservatively large. For streaming responses, monitor token count against budget.
For recovery: detect truncation via json.JSONDecodeError with "Unterminated" in the error message. Recovery options:
- Prefill the assistant turn with the partial JSON and prompt continuation
- Re-prompt from scratch with
max_tokensincreased - Use streaming and cut off cleanly at the last complete top-level value
Q: How would you design a clinical data extraction pipeline?
# High-level design:
# 1. Parse the clinical note (handle HL7, FHIR, plain text)
# 2. Chunk if necessary (long notes exceed context)
# 3. Call the LLM with structured extraction prompt
# 4. Parse and validate JSON output (Pydantic schema)
# 5. Apply domain rules (Warfarin → INR required)
# 6. Retry on validation failures (max 2 retries)
# 7. Log all failures for review
# 8. Write to structured store (SQL, FHIR resource)
class ClinicalExtractor:
def __init__(self, client, schema: type[BaseModel]):
self.client = client
self.schema = schema
def extract(self, note: str) -> BaseModel:
for attempt in range(3):
try:
response = self.client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system=EXTRACTION_SYSTEM_PROMPT,
messages=[{"role": "user", "content": f"<note>{note}</note>"}]
)
raw = json.loads(strip_markdown(response.content[0].text))
return self.schema(**raw)
except (json.JSONDecodeError, ValidationError) as e:
if attempt == 2:
raise
# next iteration adds error to promptQ: How do you handle enums reliably?
Enumerate all valid values explicitly in the prompt. Show a concrete example of each value. Specify the default if the value is ambiguous. Validate the enum value after parsing — reject and retry if the model invented a new value.
# In the prompt:
# 'urgency' MUST be exactly one of:
# - "low" (can wait for routine follow-up)
# - "medium" (needs attention within 24 hours)
# - "high" (needs immediate action)
# Default to "medium" if unclear.
# In validation:
class ClinicalSummary(BaseModel):
urgency: Literal["low", "medium", "high"] # Pydantic validates enumQ: When would you use tool use instead of a JSON prompt?
Tool use / function calling is preferred when:
- Schema compliance is critical (medical, legal) — sampling-level enforcement
- You have multiple possible output structures and the model should "decide" which tool to call
- The output may be very long and schema enforcement prevents runaway generation
- You need typed arguments without a parsing step
JSON prompt is acceptable when:
- The model is small and doesn't support tool use
- Schema is simple and rarely changes
- You need the output embedded in a larger text response rather than isolated
Q: How do you handle partial extractions?
When a field is missing from the source (e.g., no dose is mentioned for a medication):
- In the prompt: define the null convention explicitly (
nullvs""vs"Not documented") - In the schema: use
Optional[str]orstr | None— Pydantic accepts null - In business logic: handle null explicitly; don't assume all fields are present
- Don't retry just because a field is null — that's correct behaviour for missing data
Interview Answer Template
"Reliable structured extraction from LLMs uses a layered approach: TypeScript-style schema in the prompt, API-level enforcement via Structured Outputs or tool use, Pydantic validation after parsing, and a 2-3 attempt retry loop that feeds the specific error back to the model. JSON mode guarantees syntax but not schema compliance — prefer Structured Outputs for strict schema requirements. Define null conventions explicitly in the prompt, use Literal types for enums, and set max_tokens generously to prevent truncation. Log all validation failures: they reveal either prompt gaps or genuinely ambiguous inputs that need special handling."