GenAI & LLM Interviews · Lesson 10 of 30

Structured Output & JSON Mode

The Problem with Free-Text Output

LLMs generate tokens one at a time. Without constraints, they naturally produce prose — inconsistent structure that breaks downstream parsing. A prompt asking for JSON might return:

Here's the drug interaction information:
```json
{"drug": "warfarin", ...}

Let me know if you need more details!


This has markdown fences, extra prose, and inconsistent formatting. Production pipelines break on this.

---

## Method 1: OpenAI JSON Mode

Guarantees valid JSON output — always parseable, never wrapped in prose:

```python
from openai import OpenAI
import json

client = OpenAI()

def extract_to_json(text: str, schema_description: str) -> dict:
    response = client.chat.completions.create(
        model="gpt-4o",
        response_format={"type": "json_object"},  # Guarantees JSON
        messages=[
            {
                "role": "system",
                "content": f"""Extract information from clinical text and return as JSON.
Schema: {schema_description}
Return ONLY JSON matching the schema.""",
            },
            {"role": "user", "content": text},
        ],
        temperature=0,
    )
    return json.loads(response.choices[0].message.content)

# JSON mode guarantees valid JSON but not schema compliance — validate separately
result = extract_to_json(
    text="Patient takes warfarin 5mg daily for AFib. INR today 3.8 (target 2-3).",
    schema_description='{"drug": str, "dose_mg": float, "indication": str, "current_inr": float, "target_inr_range": str}',
)
print(json.dumps(result, indent=2))

Method 2: Structured Output with Pydantic (OpenAI Beta)

Schema-enforced output — the model is constrained to produce exactly the schema structure:

Python

from pydantic import BaseModel, Field
from typing import Literal, Optional

class DrugInteraction(BaseModel):
    drug_a: str = Field(description="First drug in the interaction")
    drug_b: str = Field(description="Second drug in the interaction")
    severity: Literal["major", "moderate", "minor"]
    mechanism: str = Field(description="Pharmacological mechanism of the interaction")
    clinical_effect: str = Field(description="What happens to the patient")
    management: str = Field(description="Recommended clinical management")
    monitoring: list[str] = Field(description="Parameters to monitor")
    time_to_effect: Optional[str] = Field(None, description="When effects are typically seen")

class InteractionReport(BaseModel):
    interactions: list[DrugInteraction]
    overall_risk: Literal["high", "moderate", "low", "none"]
    requires_pharmacist_review: bool
    summary: str = Field(description="One sentence clinical summary")

def analyze_interactions(medications: list[str]) -> InteractionReport:
    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": "You are a clinical pharmacologist. Analyze drug interactions thoroughly.",
            },
            {
                "role": "user",
                "content": f"Analyze all interactions between: {', '.join(medications)}",
            },
        ],
        response_format=InteractionReport,
        temperature=0,
    )
    return response.choices[0].message.parsed

# Fully typed return value — no JSON parsing needed
report = analyze_interactions(["warfarin 5mg daily", "clarithromycin 500mg BID", "aspirin 81mg daily"])

print(f"Overall risk: {report.overall_risk}")
print(f"Requires review: {report.requires_pharmacist_review}")
for interaction in report.interactions:
    print(f"\n{interaction.drug_a} + {interaction.drug_b}: {interaction.severity}")
    print(f"  Management: {interaction.management}")

Method 3: outlines Library (Constrained Decoding)

For local models (LLaMA, Mistral), use outlines to constrain token generation to valid JSON at the token level — mathematically impossible to produce invalid output:

Python

import outlines
from pydantic import BaseModel
from typing import Literal

# Load local model
model = outlines.models.transformers("meta-llama/Llama-3-8B-Instruct")

class DrugFact(BaseModel):
    drug_name: str
    drug_class: str
    mechanism: str
    primary_indication: str
    monitoring_required: bool

# Constrained generation: physically impossible to produce invalid JSON
generator = outlines.generate.json(model, DrugFact)

result = generator("Provide facts about warfarin:")
# result is a DrugFact instance — guaranteed valid

print(f"Drug: {result.drug_name}")
print(f"Monitoring required: {result.monitoring_required}")

Advantage: Zero parsing failures — invalid JSON is physically impossible because the token sampling is constrained to only produce valid schema-compliant tokens at each position.

Method 4: Instructor Library

Wraps any LLM client with Pydantic validation and automatic retry:

Python

import instructor
from openai import OpenAI
from pydantic import BaseModel, validator
from typing import Literal

client = instructor.from_openai(OpenAI())

class MedicationExtraction(BaseModel):
    drug_name: str
    dose_mg: float
    frequency: Literal["once daily", "twice daily", "three times daily", "four times daily", "as needed", "other"]
    route: Literal["oral", "IV", "subcutaneous", "topical", "inhaled", "other"]
    indication: str

    @validator("dose_mg")
    def dose_must_be_positive(cls, v):
        if v <= 0:
            raise ValueError("Dose must be positive")
        return v

class PatientMedications(BaseModel):
    medications: list[MedicationExtraction]
    total_count: int
    high_risk_medications: list[str]

# Instructor automatically retries on validation failure
patient_meds = client.chat.completions.create(
    model="gpt-4o",
    response_model=PatientMedications,
    messages=[
        {
            "role": "user",
            "content": """Extract medications from: Patient is on warfarin 5mg orally daily for AFib,
metformin 1000mg PO twice daily for T2DM, and aspirin 81mg oral daily for CAD.""",
        }
    ],
    max_retries=3,  # Auto-retry on validation failure
)

for med in patient_meds.medications:
    print(f"{med.drug_name}: {med.dose_mg}mg {med.frequency} ({med.route})")
print(f"High risk: {patient_meds.high_risk_medications}")

Validation and Retry Logic

When not using constrained generation, add validation with retry:

Python

import json
from pydantic import BaseModel, ValidationError

def parse_with_retry(
    prompt: str,
    schema: type[BaseModel],
    max_retries: int = 3,
) -> BaseModel:
    """Extract structured data with validation and retry on failure."""
    last_error = None

    for attempt in range(max_retries):
        if attempt == 0:
            user_content = prompt
        else:
            user_content = f"""{prompt}

Previous attempt failed with error: {last_error}
Please fix the error and return valid JSON matching the schema exactly.
Schema fields: {list(schema.model_fields.keys())}"""

        response = client.chat.completions.create(
            model="gpt-4o",
            response_format={"type": "json_object"},
            messages=[
                {
                    "role": "system",
                    "content": f"Return JSON matching this schema: {schema.model_json_schema()}",
                },
                {"role": "user", "content": user_content},
            ],
            temperature=0 if attempt > 0 else 0.1,  # Lower temp on retry
        )

        raw = response.choices[0].message.content

        try:
            data = json.loads(raw)
            return schema(**data)
        except (json.JSONDecodeError, ValidationError) as e:
            last_error = str(e)
            print(f"Attempt {attempt + 1} failed: {e}")

    raise ValueError(f"Failed to extract valid data after {max_retries} attempts. Last error: {last_error}")

Schema Design for LLMs

The schema design affects extraction quality as much as the prompt:

Python

# BAD: too many optional fields with vague names
class BadSchema(BaseModel):
    thing1: str | None = None
    thing2: str | None = None
    details: dict | None = None

# GOOD: explicit fields with descriptions and literals
class GoodSchema(BaseModel):
    drug_name: str = Field(description="The generic name of the drug (not brand name)")
    severity: Literal["major", "moderate", "minor"] = Field(
        description="major=life-threatening, moderate=significant, minor=minimal"
    )
    requires_immediate_action: bool = Field(
        description="True if patient needs dose change or monitoring within 24 hours"
    )
    action: str = Field(description="Specific action to take in one imperative sentence")

Schema design principles:

Use Literal for fields with fixed valid values — prevents the model from inventing values
Write Field(description=...) for every field — this becomes part of the prompt
Prefer flat schemas over nested — the model handles flat schemas more reliably
Use bool instead of str for yes/no fields — prevents "yes"/"true"/"1" ambiguity
Include at least one required field — models handle all-optional schemas poorly

Parsing Fallback Strategy

Python

def robust_json_extract(raw_output: str, schema: type[BaseModel]) -> BaseModel | None:
    """Try multiple extraction strategies before giving up."""

    # Strategy 1: Direct JSON parse
    try:
        return schema(**json.loads(raw_output))
    except (json.JSONDecodeError, ValidationError):
        pass

    # Strategy 2: Strip markdown code fences
    clean = raw_output.strip()
    for fence in ["```json", "```JSON", "```"]:
        if clean.startswith(fence):
            clean = clean[len(fence):]
    if clean.endswith("```"):
        clean = clean[:-3]
    try:
        return schema(**json.loads(clean.strip()))
    except (json.JSONDecodeError, ValidationError):
        pass

    # Strategy 3: Find JSON object anywhere in the output
    import re
    json_match = re.search(r'\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}', raw_output, re.DOTALL)
    if json_match:
        try:
            return schema(**json.loads(json_match.group()))
        except (json.JSONDecodeError, ValidationError):
            pass

    return None  # All strategies failed

Structured output is one of the most important reliability techniques for production LLM systems. Always validate, always have a fallback, and prefer constrained generation (outlines, OpenAI structured output) over parsing when available.

Prompt Injection: Detection & Defense

Next Lesson

Interview: RAG Systems (Part 1)