Validation and Retry Loops
How to validate LLM outputs against schemas and business rules, and how to build retry loops that correct the model when it fails — with truncation, schema errors, and factual checks.
Why Retry Loops Are Necessary
LLMs occasionally produce invalid output even with good prompts:
Common failures:
JSON syntax error (missing comma, trailing comma, unclosed bracket)
Schema mismatch (wrong field name, wrong type, missing required field)
Truncation (model ran out of max_tokens mid-JSON)
Rule violation (added information not in the source document)
Format deviation (added explanation text around the JSON)
Business rule violation (urgency is 'critical', not in enum)A retry loop detects these failures and re-prompts the model with specific error information.
Basic Retry Pattern
from typing import TypeVar, Callable
import json
from pydantic import BaseModel, ValidationError
T = TypeVar("T", bound=BaseModel)
def call_with_retry(
call_fn: Callable[[list], str],
schema: type[T],
initial_messages: list,
max_retries: int = 2,
) -> T:
messages = list(initial_messages)
last_error = None
for attempt in range(max_retries + 1):
raw_text = call_fn(messages)
# Attempt 1: parse JSON
try:
raw_dict = json.loads(strip_markdown(raw_text))
except json.JSONDecodeError as e:
last_error = f"Invalid JSON: {e}"
messages += [
{"role": "assistant", "content": raw_text},
{"role": "user", "content": f"That was not valid JSON. Error: {e}. "
f"Return ONLY a valid JSON object matching the schema. Nothing else."}
]
continue
# Attempt 2: validate schema
try:
return schema(**raw_dict)
except ValidationError as e:
last_error = str(e)
messages += [
{"role": "assistant", "content": raw_text},
{"role": "user", "content": f"Schema validation failed:\n{e}\n"
f"Fix the errors and return a corrected JSON object."}
]
raise ValueError(f"Failed after {max_retries} retries. Last error: {last_error}")
def strip_markdown(text: str) -> str:
import re
text = re.sub(r"```(?:json)?\n?", "", text)
text = re.sub(r"\n?```", "", text)
return text.strip()Truncation Detection and Recovery
If the model runs out of tokens mid-JSON, the response is incomplete:
def is_truncated_json(text: str) -> bool:
"""Heuristic: if the model was given max_tokens and the response is nearly that long."""
try:
json.loads(text)
return False # parses successfully — not truncated
except json.JSONDecodeError as e:
if "Expecting" in str(e) or "Unterminated" in str(e):
return True
return False
def handle_truncation(messages: list, partial_json: str, call_fn: Callable) -> str:
"""Continue generating from where the model left off."""
messages += [
{"role": "assistant", "content": partial_json}, # continue from partial
{"role": "user", "content": "Your response was cut off. Continue from where you stopped."}
]
continuation = call_fn(messages)
return partial_json + continuationThe better solution: ensure max_tokens is large enough for the expected output, or use streaming with a token budget monitor.
Business Rule Validation
Schema validation ensures correct types; business rules encode domain logic:
def validate_clinical_rules(summary: ClinicalSummary) -> list[str]:
"""Returns a list of rule violations (empty if valid)."""
violations = []
# Rule: Warfarin requires INR monitoring
warfarin_present = any("warfarin" in m.name.lower() for m in summary.medications)
inr_monitored = summary.monitoring is not None and "inr" in summary.monitoring.lower()
if warfarin_present and not inr_monitored:
violations.append("Warfarin prescribed without INR monitoring documented.")
# Rule: High urgency requires at least one urgent action
if summary.urgency == "high" and not summary.urgent_actions:
violations.append("Urgency is 'high' but no urgent actions listed.")
return violations
def retry_with_violations(messages: list, violations: list[str], call_fn, schema) -> ClinicalSummary:
violation_text = "\n".join(f"- {v}" for v in violations)
messages += [
{"role": "user", "content": f"The previous response had clinical rule violations:\n{violation_text}\n\n"
"Please correct these and return the updated JSON."}
]
# one more attempt
raw = call_fn(messages)
return schema(**json.loads(strip_markdown(raw)))Exponential Backoff for Rate Limits
import time
def call_with_backoff(client, messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=messages
)
except Exception as e:
if "rate_limit" in str(e).lower() and attempt < max_retries - 1:
wait = 2 ** attempt # 1s, 2s, 4s
time.sleep(wait)
else:
raiseWhen NOT to Retry
Retry:
JSON parse error
Schema validation error
Truncation
Missing required fields
Do NOT retry:
Content policy violation (the model correctly refused — don't push it)
Factual uncertainty (if the model says "I don't know", that may be correct)
Timeout / infra failure (retry at the infrastructure level, not prompt level)
Budget exceeded (infinite retry loops are expensive — cap at 2-3)Interview Answer
"Validation and retry loops catch the failure modes that prompts alone can't prevent: JSON syntax errors, schema type mismatches, truncation, and business rule violations. The pattern: call the model, try to parse and validate the output, and if it fails, append the specific error to the conversation and try again — giving the model the exact error message so it can self-correct. Cap retries at 2-3 to control cost. Separate JSON/schema validation (structural) from business rule validation (domain logic — e.g., Warfarin requires INR monitoring). Never retry content policy refusals — that's the model working as intended."
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.