Embedding the Output Schema in the Prompt — Prompt Engineering Mastery | Learnixo

Why Schema Clarity Matters

The more precisely the model understands the expected structure, the more reliably it produces it:

Vague: "Return structured data about the medication."
  Model might return:
  - Plain text
  - A table
  - JSON with different field names each time
  - JSON with different field types (string vs array)

Precise: [specific schema below]
  Model reliably returns: {medication: {name: ..., dose: ..., unit: ...}}

Every ambiguity in the schema specification is a failure mode.

TypeScript-Style Schema (Recommended by Anthropic)

TypeScript-style type definitions are familiar to LLMs from code training data and are very readable:

"Respond with a JSON object matching this TypeScript type:

type MedicationReview = {
  medications: Array<{
    name: string;           // drug name, title case
    dose_mg: number;        // numeric dose in mg, null if unclear
    frequency: 'once daily' | 'twice daily' | 'three times daily' | 'as needed' | 'other';
    interactions: string[]; // list of flagged interactions, empty array if none
    monitoring_required: string | null; // e.g. 'INR weekly', null if none
  }>;
  overall_risk: 'low' | 'medium' | 'high';
  summary: string;  // 1-2 sentence summary for the physician
}

Do not include any text outside the JSON object."

TypeScript type syntax is particularly effective because the model has seen millions of TypeScript type definitions during training.

JSON Schema Format

For systems that dynamically generate prompts from schema objects:

Python

import json
from typing import Any

def schema_to_prompt(schema: dict) -> str:
    """Convert JSON Schema to a natural language + schema description."""
    return f"""Respond with valid JSON conforming to this JSON Schema:

{json.dumps(schema, indent=2)}

Rules:
- All required fields must be present.
- Field types must exactly match the schema.
- Enum fields must be one of the listed values only.
- Do not include additional fields not in the schema.
- Do not include markdown or explanatory text."""

CLINICAL_SCHEMA = {
    "type": "object",
    "required": ["primary_diagnosis", "medications", "urgency"],
    "properties": {
        "primary_diagnosis": {"type": "string", "description": "Primary ICD-10 diagnosis"},
        "medications": {
            "type": "array",
            "items": {
                "type": "object",
                "required": ["name", "dose"],
                "properties": {
                    "name": {"type": "string"},
                    "dose": {"type": "string"},
                    "frequency": {"type": "string"}
                }
            }
        },
        "urgency": {"type": "string", "enum": ["low", "medium", "high"]}
    }
}

prompt = schema_to_prompt(CLINICAL_SCHEMA)

Enums and Constrained Fields

For fields with a fixed set of valid values, enumerate them explicitly:

"urgency" must be EXACTLY one of:
  - "low"     (routine follow-up, no immediate action)
  - "medium"  (action required within 24 hours)
  - "high"    (action required immediately)

Do not use any other values. Do not add qualifiers like "medium-high".

If urgency is unclear from the note, return "medium" as default.

Explicit enum values eliminate the model inventing its own values ("urgent", "moderate", "critical").

Nullable Fields

Tell the model explicitly what to use for missing data:

"For missing or unavailable information:
  - Use null (not 'N/A', not empty string, not 0)
  - Exception: arrays should be [] when empty, not null

dose_mg: number | null    — null if dose is not numeric or not stated
monitoring: string | null — null if no monitoring is required
interactions: string[]    — [] if no interactions found (not null)

Inconsistent null handling is one of the most common schema failure modes — specify it explicitly.

Nested Schemas

For nested structures, use indentation and comments:

"Respond with this JSON structure:

{
  'patient': {
    'age': number,         // integer age in years
    'sex': 'M' | 'F' | 'unknown'
  },
  'assessment': {
    'diagnoses': [         // array of all active diagnoses
      {
        'icd10': string,   // ICD-10 code, e.g. 'I48.0'
        'name': string,    // full name of diagnosis
        'is_primary': boolean
      }
    ],
    'risk_score': number   // composite risk score 1-10
  }
}
"

Schema in Tool Definitions

When using function calling (OpenAI) or tool use (Anthropic), the schema is enforced at the API level:

Python

from anthropic import Anthropic

tools = [
    {
        "name": "extract_clinical_data",
        "description": "Extract structured clinical data from a medical note.",
        "input_schema": {
            "type": "object",
            "required": ["diagnosis", "medications"],
            "properties": {
                "diagnosis": {"type": "string"},
                "medications": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "dose": {"type": "string"}
                        }
                    }
                }
            }
        }
    }
]

client = Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-6",
    tools=tools,
    messages=[{"role": "user", "content": "Extract from: Patient on Warfarin 5mg for AF."}]
)

Interview Answer

"Schema definition in prompts uses TypeScript-type syntax (familiar to LLMs from code training data), JSON Schema (machine-generated, useful for dynamic prompts), or concrete filled-in examples. Key patterns: enumerate all valid enum values explicitly and specify what to do on mismatch; define the null/missing convention (null vs empty string vs empty array varies by field type); explain nested structures with comments. For production systems, API-level schema enforcement (OpenAI Structured Outputs, Anthropic tool use) is more reliable than prompt-only approaches — the schema is baked into the sampling process, not just described to the model."