Validating Output and Retrying on Failure — Prompt Engineering Mastery | Learnixo

Why Retry Loops Are Necessary

LLMs occasionally produce invalid output even with good prompts:

Common failures:
  JSON syntax error (missing comma, trailing comma, unclosed bracket)
  Schema mismatch (wrong field name, wrong type, missing required field)
  Truncation (model ran out of max_tokens mid-JSON)
  Rule violation (added information not in the source document)
  Format deviation (added explanation text around the JSON)
  Business rule violation (urgency is 'critical', not in enum)

A retry loop detects these failures and re-prompts the model with specific error information.

Basic Retry Pattern

Python

from typing import TypeVar, Callable
import json
from pydantic import BaseModel, ValidationError

T = TypeVar("T", bound=BaseModel)

def call_with_retry(
    call_fn: Callable[[list], str],
    schema: type[T],
    initial_messages: list,
    max_retries: int = 2,
) -> T:
    messages = list(initial_messages)
    last_error = None

    for attempt in range(max_retries + 1):
        raw_text = call_fn(messages)

        # Attempt 1: parse JSON
        try:
            raw_dict = json.loads(strip_markdown(raw_text))
        except json.JSONDecodeError as e:
            last_error = f"Invalid JSON: {e}"
            messages += [
                {"role": "assistant", "content": raw_text},
                {"role": "user", "content": f"That was not valid JSON. Error: {e}. "
                 f"Return ONLY a valid JSON object matching the schema. Nothing else."}
            ]
            continue

        # Attempt 2: validate schema
        try:
            return schema(**raw_dict)
        except ValidationError as e:
            last_error = str(e)
            messages += [
                {"role": "assistant", "content": raw_text},
                {"role": "user", "content": f"Schema validation failed:\n{e}\n"
                 f"Fix the errors and return a corrected JSON object."}
            ]

    raise ValueError(f"Failed after {max_retries} retries. Last error: {last_error}")

def strip_markdown(text: str) -> str:
    import re
    text = re.sub(r"```(?:json)?\n?", "", text)
    text = re.sub(r"\n?```", "", text)
    return text.strip()

Truncation Detection and Recovery

If the model runs out of tokens mid-JSON, the response is incomplete:

Python

def is_truncated_json(text: str) -> bool:
    """Heuristic: if the model was given max_tokens and the response is nearly that long."""
    try:
        json.loads(text)
        return False  # parses successfully — not truncated
    except json.JSONDecodeError as e:
        if "Expecting" in str(e) or "Unterminated" in str(e):
            return True
    return False

def handle_truncation(messages: list, partial_json: str, call_fn: Callable) -> str:
    """Continue generating from where the model left off."""
    messages += [
        {"role": "assistant", "content": partial_json},  # continue from partial
        {"role": "user", "content": "Your response was cut off. Continue from where you stopped."}
    ]
    continuation = call_fn(messages)
    return partial_json + continuation

The better solution: ensure max_tokens is large enough for the expected output, or use streaming with a token budget monitor.

Business Rule Validation

Schema validation ensures correct types; business rules encode domain logic:

Python

def validate_clinical_rules(summary: ClinicalSummary) -> list[str]:
    """Returns a list of rule violations (empty if valid)."""
    violations = []

    # Rule: Warfarin requires INR monitoring
    warfarin_present = any("warfarin" in m.name.lower() for m in summary.medications)
    inr_monitored = summary.monitoring is not None and "inr" in summary.monitoring.lower()
    if warfarin_present and not inr_monitored:
        violations.append("Warfarin prescribed without INR monitoring documented.")

    # Rule: High urgency requires at least one urgent action
    if summary.urgency == "high" and not summary.urgent_actions:
        violations.append("Urgency is 'high' but no urgent actions listed.")

    return violations

def retry_with_violations(messages: list, violations: list[str], call_fn, schema) -> ClinicalSummary:
    violation_text = "\n".join(f"- {v}" for v in violations)
    messages += [
        {"role": "user", "content": f"The previous response had clinical rule violations:\n{violation_text}\n\n"
         "Please correct these and return the updated JSON."}
    ]
    # one more attempt
    raw = call_fn(messages)
    return schema(**json.loads(strip_markdown(raw)))

Exponential Backoff for Rate Limits

Python

import time

def call_with_backoff(client, messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                messages=messages
            )
        except Exception as e:
            if "rate_limit" in str(e).lower() and attempt < max_retries - 1:
                wait = 2 ** attempt  # 1s, 2s, 4s
                time.sleep(wait)
            else:
                raise

When NOT to Retry

Retry:
  JSON parse error
  Schema validation error
  Truncation
  Missing required fields

Do NOT retry:
  Content policy violation (the model correctly refused — don't push it)
  Factual uncertainty (if the model says "I don't know", that may be correct)
  Timeout / infra failure (retry at the infrastructure level, not prompt level)
  Budget exceeded (infinite retry loops are expensive — cap at 2-3)

Interview Answer

"Validation and retry loops catch the failure modes that prompts alone can't prevent: JSON syntax errors, schema type mismatches, truncation, and business rule violations. The pattern: call the model, try to parse and validate the output, and if it fails, append the specific error to the conversation and try again — giving the model the exact error message so it can self-correct. Cap retries at 2-3 to control cost. Separate JSON/schema validation (structural) from business rule validation (domain logic — e.g., Warfarin requires INR monitoring). Never retry content policy refusals — that's the model working as intended."