Learnixo
Back to blog
AI Systemsintermediate

Validation and Retry Loops

How to validate LLM outputs against schemas and business rules, and how to build retry loops that correct the model when it fails — with truncation, schema errors, and factual checks.

Asma Hafeez KhanMay 16, 20264 min read
Prompt EngineeringValidationRetryProduction AIInterview
Share:𝕏

Why Retry Loops Are Necessary

LLMs occasionally produce invalid output even with good prompts:

Common failures:
  JSON syntax error (missing comma, trailing comma, unclosed bracket)
  Schema mismatch (wrong field name, wrong type, missing required field)
  Truncation (model ran out of max_tokens mid-JSON)
  Rule violation (added information not in the source document)
  Format deviation (added explanation text around the JSON)
  Business rule violation (urgency is 'critical', not in enum)

A retry loop detects these failures and re-prompts the model with specific error information.


Basic Retry Pattern

Python
from typing import TypeVar, Callable
import json
from pydantic import BaseModel, ValidationError

T = TypeVar("T", bound=BaseModel)

def call_with_retry(
    call_fn: Callable[[list], str],
    schema: type[T],
    initial_messages: list,
    max_retries: int = 2,
) -> T:
    messages = list(initial_messages)
    last_error = None

    for attempt in range(max_retries + 1):
        raw_text = call_fn(messages)

        # Attempt 1: parse JSON
        try:
            raw_dict = json.loads(strip_markdown(raw_text))
        except json.JSONDecodeError as e:
            last_error = f"Invalid JSON: {e}"
            messages += [
                {"role": "assistant", "content": raw_text},
                {"role": "user", "content": f"That was not valid JSON. Error: {e}. "
                 f"Return ONLY a valid JSON object matching the schema. Nothing else."}
            ]
            continue

        # Attempt 2: validate schema
        try:
            return schema(**raw_dict)
        except ValidationError as e:
            last_error = str(e)
            messages += [
                {"role": "assistant", "content": raw_text},
                {"role": "user", "content": f"Schema validation failed:\n{e}\n"
                 f"Fix the errors and return a corrected JSON object."}
            ]

    raise ValueError(f"Failed after {max_retries} retries. Last error: {last_error}")

def strip_markdown(text: str) -> str:
    import re
    text = re.sub(r"```(?:json)?\n?", "", text)
    text = re.sub(r"\n?```", "", text)
    return text.strip()

Truncation Detection and Recovery

If the model runs out of tokens mid-JSON, the response is incomplete:

Python
def is_truncated_json(text: str) -> bool:
    """Heuristic: if the model was given max_tokens and the response is nearly that long."""
    try:
        json.loads(text)
        return False  # parses successfully  not truncated
    except json.JSONDecodeError as e:
        if "Expecting" in str(e) or "Unterminated" in str(e):
            return True
    return False

def handle_truncation(messages: list, partial_json: str, call_fn: Callable) -> str:
    """Continue generating from where the model left off."""
    messages += [
        {"role": "assistant", "content": partial_json},  # continue from partial
        {"role": "user", "content": "Your response was cut off. Continue from where you stopped."}
    ]
    continuation = call_fn(messages)
    return partial_json + continuation

The better solution: ensure max_tokens is large enough for the expected output, or use streaming with a token budget monitor.


Business Rule Validation

Schema validation ensures correct types; business rules encode domain logic:

Python
def validate_clinical_rules(summary: ClinicalSummary) -> list[str]:
    """Returns a list of rule violations (empty if valid)."""
    violations = []

    # Rule: Warfarin requires INR monitoring
    warfarin_present = any("warfarin" in m.name.lower() for m in summary.medications)
    inr_monitored = summary.monitoring is not None and "inr" in summary.monitoring.lower()
    if warfarin_present and not inr_monitored:
        violations.append("Warfarin prescribed without INR monitoring documented.")

    # Rule: High urgency requires at least one urgent action
    if summary.urgency == "high" and not summary.urgent_actions:
        violations.append("Urgency is 'high' but no urgent actions listed.")

    return violations

def retry_with_violations(messages: list, violations: list[str], call_fn, schema) -> ClinicalSummary:
    violation_text = "\n".join(f"- {v}" for v in violations)
    messages += [
        {"role": "user", "content": f"The previous response had clinical rule violations:\n{violation_text}\n\n"
         "Please correct these and return the updated JSON."}
    ]
    # one more attempt
    raw = call_fn(messages)
    return schema(**json.loads(strip_markdown(raw)))

Exponential Backoff for Rate Limits

Python
import time

def call_with_backoff(client, messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                messages=messages
            )
        except Exception as e:
            if "rate_limit" in str(e).lower() and attempt < max_retries - 1:
                wait = 2 ** attempt  # 1s, 2s, 4s
                time.sleep(wait)
            else:
                raise

When NOT to Retry

Retry:
  JSON parse error
  Schema validation error
  Truncation
  Missing required fields

Do NOT retry:
  Content policy violation (the model correctly refused — don't push it)
  Factual uncertainty (if the model says "I don't know", that may be correct)
  Timeout / infra failure (retry at the infrastructure level, not prompt level)
  Budget exceeded (infinite retry loops are expensive — cap at 2-3)

Interview Answer

"Validation and retry loops catch the failure modes that prompts alone can't prevent: JSON syntax errors, schema type mismatches, truncation, and business rule violations. The pattern: call the model, try to parse and validate the output, and if it fails, append the specific error to the conversation and try again — giving the model the exact error message so it can self-correct. Cap retries at 2-3 to control cost. Separate JSON/schema validation (structural) from business rule validation (domain logic — e.g., Warfarin requires INR monitoring). Never retry content policy refusals — that's the model working as intended."

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.