Schema Definition in Prompts

Why Schema Clarity Matters

The more precisely the model understands the expected structure, the more reliably it produces it:

Vague: "Return structured data about the medication."
  Model might return:
  - Plain text
  - A table
  - JSON with different field names each time
  - JSON with different field types (string vs array)

Precise: [specific schema below]
  Model reliably returns: {medication: {name: ..., dose: ..., unit: ...}}

Every ambiguity in the schema specification is a failure mode.

TypeScript-Style Schema (Recommended by Anthropic)

TypeScript-style type definitions are familiar to LLMs from code training data and are very readable:

"Respond with a JSON object matching this TypeScript type:

type MedicationReview = {
  medications: Array<{
    name: string;           // drug name, title case
    dose_mg: number;        // numeric dose in mg, null if unclear
    frequency: 'once daily' | 'twice daily' | 'three times daily' | 'as needed' | 'other';
    interactions: string[]; // list of flagged interactions, empty array if none
    monitoring_required: string | null; // e.g. 'INR weekly', null if none
  }>;
  overall_risk: 'low' | 'medium' | 'high';
  summary: string;  // 1-2 sentence summary for the physician
}

Do not include any text outside the JSON object."

TypeScript type syntax is particularly effective because the model has seen millions of TypeScript type definitions during training.

JSON Schema Format

For systems that dynamically generate prompts from schema objects:

Python

import json
from typing import Any

def schema_to_prompt(schema: dict) -> str:
    """Convert JSON Schema to a natural language + schema description."""
    return f"""Respond with valid JSON conforming to this JSON Schema:

{json.dumps(schema, indent=2)}

Rules:
- All required fields must be present.
- Field types must exactly match the schema.
- Enum fields must be one of the listed values only.
- Do not include additional fields not in the schema.
- Do not include markdown or explanatory text."""

CLINICAL_SCHEMA = {
    "type": "object",
    "required": ["primary_diagnosis", "medications", "urgency"],
    "properties": {
        "primary_diagnosis": {"type": "string", "description": "Primary ICD-10 diagnosis"},
        "medications": {
            "type": "array",
            "items": {
                "type": "object",
                "required": ["name", "dose"],
                "properties": {
                    "name": {"type": "string"},
                    "dose": {"type": "string"},
                    "frequency": {"type": "string"}
                }
            }
        },
        "urgency": {"type": "string", "enum": ["low", "medium", "high"]}
    }
}

prompt = schema_to_prompt(CLINICAL_SCHEMA)

Enums and Constrained Fields

For fields with a fixed set of valid values, enumerate them explicitly:

"urgency" must be EXACTLY one of:
  - "low"     (routine follow-up, no immediate action)
  - "medium"  (action required within 24 hours)
  - "high"    (action required immediately)

Do not use any other values. Do not add qualifiers like "medium-high".

If urgency is unclear from the note, return "medium" as default.

Explicit enum values eliminate the model inventing its own values ("urgent", "moderate", "critical").

Nullable Fields

Tell the model explicitly what to use for missing data:

"For missing or unavailable information:
  - Use null (not 'N/A', not empty string, not 0)
  - Exception: arrays should be [] when empty, not null

dose_mg: number | null    — null if dose is not numeric or not stated
monitoring: string | null — null if no monitoring is required
interactions: string[]    — [] if no interactions found (not null)

Inconsistent null handling is one of the most common schema failure modes — specify it explicitly.

Nested Schemas

For nested structures, use indentation and comments:

"Respond with this JSON structure:

{
  'patient': {
    'age': number,         // integer age in years
    'sex': 'M' | 'F' | 'unknown'
  },
  'assessment': {
    'diagnoses': [         // array of all active diagnoses
      {
        'icd10': string,   // ICD-10 code, e.g. 'I48.0'
        'name': string,    // full name of diagnosis
        'is_primary': boolean
      }
    ],
    'risk_score': number   // composite risk score 1-10
  }
}
"

Schema in Tool Definitions

When using function calling (OpenAI) or tool use (Anthropic), the schema is enforced at the API level: