GenAI & LLM Interviews · Lesson 10 of 30
Structured Output & JSON Mode
The Problem with Free-Text Output
LLMs generate tokens one at a time. Without constraints, they naturally produce prose — inconsistent structure that breaks downstream parsing. A prompt asking for JSON might return:
Here's the drug interaction information:
```json
{"drug": "warfarin", ...}Let me know if you need more details!
This has markdown fences, extra prose, and inconsistent formatting. Production pipelines break on this.
---
## Method 1: OpenAI JSON Mode
Guarantees valid JSON output — always parseable, never wrapped in prose:
```python
from openai import OpenAI
import json
client = OpenAI()
def extract_to_json(text: str, schema_description: str) -> dict:
response = client.chat.completions.create(
model="gpt-4o",
response_format={"type": "json_object"}, # Guarantees JSON
messages=[
{
"role": "system",
"content": f"""Extract information from clinical text and return as JSON.
Schema: {schema_description}
Return ONLY JSON matching the schema.""",
},
{"role": "user", "content": text},
],
temperature=0,
)
return json.loads(response.choices[0].message.content)
# JSON mode guarantees valid JSON but not schema compliance — validate separately
result = extract_to_json(
text="Patient takes warfarin 5mg daily for AFib. INR today 3.8 (target 2-3).",
schema_description='{"drug": str, "dose_mg": float, "indication": str, "current_inr": float, "target_inr_range": str}',
)
print(json.dumps(result, indent=2))Method 2: Structured Output with Pydantic (OpenAI Beta)
Schema-enforced output — the model is constrained to produce exactly the schema structure:
from pydantic import BaseModel, Field
from typing import Literal, Optional
class DrugInteraction(BaseModel):
drug_a: str = Field(description="First drug in the interaction")
drug_b: str = Field(description="Second drug in the interaction")
severity: Literal["major", "moderate", "minor"]
mechanism: str = Field(description="Pharmacological mechanism of the interaction")
clinical_effect: str = Field(description="What happens to the patient")
management: str = Field(description="Recommended clinical management")
monitoring: list[str] = Field(description="Parameters to monitor")
time_to_effect: Optional[str] = Field(None, description="When effects are typically seen")
class InteractionReport(BaseModel):
interactions: list[DrugInteraction]
overall_risk: Literal["high", "moderate", "low", "none"]
requires_pharmacist_review: bool
summary: str = Field(description="One sentence clinical summary")
def analyze_interactions(medications: list[str]) -> InteractionReport:
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "You are a clinical pharmacologist. Analyze drug interactions thoroughly.",
},
{
"role": "user",
"content": f"Analyze all interactions between: {', '.join(medications)}",
},
],
response_format=InteractionReport,
temperature=0,
)
return response.choices[0].message.parsed
# Fully typed return value — no JSON parsing needed
report = analyze_interactions(["warfarin 5mg daily", "clarithromycin 500mg BID", "aspirin 81mg daily"])
print(f"Overall risk: {report.overall_risk}")
print(f"Requires review: {report.requires_pharmacist_review}")
for interaction in report.interactions:
print(f"\n{interaction.drug_a} + {interaction.drug_b}: {interaction.severity}")
print(f" Management: {interaction.management}")Method 3: outlines Library (Constrained Decoding)
For local models (LLaMA, Mistral), use outlines to constrain token generation to valid JSON at the token level — mathematically impossible to produce invalid output:
import outlines
from pydantic import BaseModel
from typing import Literal
# Load local model
model = outlines.models.transformers("meta-llama/Llama-3-8B-Instruct")
class DrugFact(BaseModel):
drug_name: str
drug_class: str
mechanism: str
primary_indication: str
monitoring_required: bool
# Constrained generation: physically impossible to produce invalid JSON
generator = outlines.generate.json(model, DrugFact)
result = generator("Provide facts about warfarin:")
# result is a DrugFact instance — guaranteed valid
print(f"Drug: {result.drug_name}")
print(f"Monitoring required: {result.monitoring_required}")Advantage: Zero parsing failures — invalid JSON is physically impossible because the token sampling is constrained to only produce valid schema-compliant tokens at each position.
Method 4: Instructor Library
Wraps any LLM client with Pydantic validation and automatic retry:
import instructor
from openai import OpenAI
from pydantic import BaseModel, validator
from typing import Literal
client = instructor.from_openai(OpenAI())
class MedicationExtraction(BaseModel):
drug_name: str
dose_mg: float
frequency: Literal["once daily", "twice daily", "three times daily", "four times daily", "as needed", "other"]
route: Literal["oral", "IV", "subcutaneous", "topical", "inhaled", "other"]
indication: str
@validator("dose_mg")
def dose_must_be_positive(cls, v):
if v <= 0:
raise ValueError("Dose must be positive")
return v
class PatientMedications(BaseModel):
medications: list[MedicationExtraction]
total_count: int
high_risk_medications: list[str]
# Instructor automatically retries on validation failure
patient_meds = client.chat.completions.create(
model="gpt-4o",
response_model=PatientMedications,
messages=[
{
"role": "user",
"content": """Extract medications from: Patient is on warfarin 5mg orally daily for AFib,
metformin 1000mg PO twice daily for T2DM, and aspirin 81mg oral daily for CAD.""",
}
],
max_retries=3, # Auto-retry on validation failure
)
for med in patient_meds.medications:
print(f"{med.drug_name}: {med.dose_mg}mg {med.frequency} ({med.route})")
print(f"High risk: {patient_meds.high_risk_medications}")Validation and Retry Logic
When not using constrained generation, add validation with retry:
import json
from pydantic import BaseModel, ValidationError
def parse_with_retry(
prompt: str,
schema: type[BaseModel],
max_retries: int = 3,
) -> BaseModel:
"""Extract structured data with validation and retry on failure."""
last_error = None
for attempt in range(max_retries):
if attempt == 0:
user_content = prompt
else:
user_content = f"""{prompt}
Previous attempt failed with error: {last_error}
Please fix the error and return valid JSON matching the schema exactly.
Schema fields: {list(schema.model_fields.keys())}"""
response = client.chat.completions.create(
model="gpt-4o",
response_format={"type": "json_object"},
messages=[
{
"role": "system",
"content": f"Return JSON matching this schema: {schema.model_json_schema()}",
},
{"role": "user", "content": user_content},
],
temperature=0 if attempt > 0 else 0.1, # Lower temp on retry
)
raw = response.choices[0].message.content
try:
data = json.loads(raw)
return schema(**data)
except (json.JSONDecodeError, ValidationError) as e:
last_error = str(e)
print(f"Attempt {attempt + 1} failed: {e}")
raise ValueError(f"Failed to extract valid data after {max_retries} attempts. Last error: {last_error}")Schema Design for LLMs
The schema design affects extraction quality as much as the prompt:
# BAD: too many optional fields with vague names
class BadSchema(BaseModel):
thing1: str | None = None
thing2: str | None = None
details: dict | None = None
# GOOD: explicit fields with descriptions and literals
class GoodSchema(BaseModel):
drug_name: str = Field(description="The generic name of the drug (not brand name)")
severity: Literal["major", "moderate", "minor"] = Field(
description="major=life-threatening, moderate=significant, minor=minimal"
)
requires_immediate_action: bool = Field(
description="True if patient needs dose change or monitoring within 24 hours"
)
action: str = Field(description="Specific action to take in one imperative sentence")Schema design principles:
- Use
Literalfor fields with fixed valid values — prevents the model from inventing values - Write
Field(description=...)for every field — this becomes part of the prompt - Prefer flat schemas over nested — the model handles flat schemas more reliably
- Use
boolinstead ofstrfor yes/no fields — prevents "yes"/"true"/"1" ambiguity - Include at least one required field — models handle all-optional schemas poorly
Parsing Fallback Strategy
def robust_json_extract(raw_output: str, schema: type[BaseModel]) -> BaseModel | None:
"""Try multiple extraction strategies before giving up."""
# Strategy 1: Direct JSON parse
try:
return schema(**json.loads(raw_output))
except (json.JSONDecodeError, ValidationError):
pass
# Strategy 2: Strip markdown code fences
clean = raw_output.strip()
for fence in ["```json", "```JSON", "```"]:
if clean.startswith(fence):
clean = clean[len(fence):]
if clean.endswith("```"):
clean = clean[:-3]
try:
return schema(**json.loads(clean.strip()))
except (json.JSONDecodeError, ValidationError):
pass
# Strategy 3: Find JSON object anywhere in the output
import re
json_match = re.search(r'\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}', raw_output, re.DOTALL)
if json_match:
try:
return schema(**json.loads(json_match.group()))
except (json.JSONDecodeError, ValidationError):
pass
return None # All strategies failedStructured output is one of the most important reliability techniques for production LLM systems. Always validate, always have a fallback, and prefer constrained generation (outlines, OpenAI structured output) over parsing when available.