Handling Tool Errors Gracefully

Why Error Handling Matters in Agent Loops

In a standard API call, an exception propagates up the stack and the caller decides what to do. In an agent loop, an unhandled exception in a tool function breaks the entire conversation. The LLM never sees what went wrong, cannot adapt, and the user gets a crash rather than an explanation.

The correct pattern: tools never raise exceptions to the caller. They catch all failures internally and return a structured error dict that the LLM can read and respond to intelligently.

The Core Pattern: Try/Except in Every Tool

Python

import json
import logging

logger = logging.getLogger(__name__)

def get_patient_record(patient_id: str) -> dict:
    """
    Always returns a dict — never raises.
    On success: {"success": True, "data": {...}}
    On failure: {"success": False, "error": "...", "hint": "..."}
    """
    try:
        conn = get_db_connection()
        record = conn.execute(
            "SELECT * FROM patients WHERE patient_id = %s",
            (patient_id,)
        ).fetchone()

        if record is None:
            return {
                "success": False,
                "error": "Patient not found",
                "patient_id": patient_id,
                "hint": "Verify the patient ID format (P-NNNNN) and try again."
            }

        return {
            "success": True,
            "data": dict(record)
        }

    except ConnectionError as e:
        logger.error("DB connection failed in get_patient_record: %s", e)
        return {
            "success": False,
            "error": "Database temporarily unavailable",
            "patient_id": patient_id,
            "hint": "Try again in a few seconds. If the problem persists, contact IT."
        }

    except Exception as e:
        logger.exception("Unexpected error in get_patient_record for %s", patient_id)
        return {
            "success": False,
            "error": "Unexpected error",
            "detail": str(e),
            "patient_id": patient_id
        }

The LLM reads the error dict, understands what went wrong, and responds accordingly — e.g., "I wasn't able to find patient P-99999. Could you double-check the ID?"

Error Response Format

Use a consistent error format across all tools so the LLM learns to recognize them:

Python

def make_error(
    error: str,
    hint: str = None,
    retry_suggested: bool = False,
    **extra
) -> dict:
    """Standard error response builder."""
    result = {
        "success": False,
        "error": error,
    }
    if hint:
        result["hint"] = hint
    if retry_suggested:
        result["retry_suggested"] = True
    result.update(extra)
    return result

# Usage
return make_error(
    "Drug not found in formulary",
    hint="Try the generic name instead of the brand name.",
    drug_name=drug_name
)

return make_error(
    "External API timeout",
    hint="The service is slow. Retry once.",
    retry_suggested=True
)

When retry_suggested is True, the LLM (with appropriate system prompting) will retry the tool call automatically.

Retry Logic Inside the Tool

Some failures are transient and should be retried immediately — network timeouts, temporary database unavailability, rate limit responses from external APIs.

Python

import time
import httpx
import logging
from typing import Optional

logger = logging.getLogger(__name__)

def fetch_drug_from_external_api(
    drug_name: str,
    max_retries: int = 3,
    base_delay: float = 0.5
) -> dict:
    """
    Fetches drug information from an external API.
    Retries on timeout and 5xx errors with exponential backoff.
    """
    url = "https://api.rxnorm.nlm.nih.gov/REST/drugs.json"

    for attempt in range(1, max_retries + 1):
        try:
            with httpx.Client(timeout=5.0) as client:
                response = client.get(url, params={"name": drug_name})

            if response.status_code == 200:
                data = response.json()
                return {"success": True, "data": data, "attempts": attempt}

            if response.status_code == 429:
                # Rate limited — wait longer
                wait = base_delay * (2 ** attempt)
                logger.warning("Rate limited on attempt %d. Waiting %.1fs", attempt, wait)
                time.sleep(wait)
                continue

            if response.status_code >= 500:
                # Server error — retry
                wait = base_delay * (2 ** (attempt - 1))
                logger.warning("Server error %d on attempt %d. Waiting %.1fs",
                               response.status_code, attempt, wait)
                time.sleep(wait)
                continue

            # Client error (400-range) — don't retry
            return make_error(
                f"API returned {response.status_code}",
                hint="Check the drug name spelling.",
                drug_name=drug_name
            )

        except httpx.TimeoutException:
            wait = base_delay * (2 ** (attempt - 1))
            logger.warning("Timeout on attempt %d/%d. Waiting %.1fs", attempt, max_retries, wait)
            if attempt < max_retries:
                time.sleep(wait)
            continue

        except httpx.RequestError as e:
            logger.error("Request error: %s", e)
            return make_error("Network error", detail=str(e), drug_name=drug_name)

    # All retries exhausted
    logger.error("All %d attempts failed for drug: %s", max_retries, drug_name)
    return make_error(
        "External service unavailable after multiple retries",
        hint="Try again later or check the service status.",
        drug_name=drug_name,
        attempts_made=max_retries
    )

Max Retries + Fallback Behavior

Some tools have a fallback — if the primary source fails, try a secondary one.

Python

def get_drug_info_with_fallback(drug_name: str) -> dict:
    """
    Try primary database first, fall back to external API,
    fall back to cached data if both fail.
    """
    # Attempt 1: Internal database (fastest, most reliable)
    primary = query_internal_formulary(drug_name)
    if primary.get("success"):
        return {**primary, "source": "internal_formulary"}

    logger.warning("Internal formulary failed for %s: %s", drug_name, primary.get("error"))

    # Attempt 2: External API
    secondary = fetch_drug_from_external_api(drug_name)
    if secondary.get("success"):
        return {**secondary, "source": "external_api"}

    logger.warning("External API also failed for %s: %s", drug_name, secondary.get("error"))

    # Attempt 3: Cached data (may be stale)
    cache = get_from_cache(f"drug:{drug_name.lower()}")
    if cache:
        return {
            "success": True,
            "data": cache,
            "source": "cache",
            "warning": "Data may be up to 24 hours old"
        }

    # All sources exhausted
    return make_error(
        "Drug information unavailable from all sources",
        hint="Try the generic name or contact the pharmacy team directly.",
        drug_name=drug_name,
        sources_tried=["internal_formulary", "external_api", "cache"]
    )

Handling Errors in the Agent Loop

The agent loop must append error results just like success results — the LLM needs to see the error to respond appropriately.

Python

import json
import openai

client = openai.OpenAI()

def run_resilient_agent(user_message: str, tools: list, tool_map: dict) -> str:
    messages = [
        {
            "role": "system",
            "content": (
                "You are a clinical assistant. When a tool returns an error with "
                "'retry_suggested: true', retry the tool call once. "
                "When a tool returns an error, explain the situation to the user "
                "clearly and suggest next steps based on the 'hint' field."
            )
        },
        {"role": "user", "content": user_message}
    ]

    for iteration in range(8):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )

        msg = response.choices[0].message

        if not msg.tool_calls:
            return msg.content or ""

        messages.append(msg)

        for tc in msg.tool_calls:
            fn_name = tc.function.name

            try:
                fn_args = json.loads(tc.function.arguments)
            except json.JSONDecodeError as e:
                # The LLM returned malformed JSON arguments — extremely rare
                result = {
                    "success": False,
                    "error": "Malformed tool arguments",
                    "detail": str(e),
                    "raw_arguments": tc.function.arguments
                }
                messages.append({
                    "role": "tool",
                    "tool_call_id": tc.id,
                    "content": json.dumps(result)
                })
                continue

            if fn_name not in tool_map:
                result = {
                    "success": False,
                    "error": f"Tool '{fn_name}' not available",
                    "available_tools": list(tool_map.keys())
                }
            else:
                # Tool functions never raise — they return error dicts
                result = tool_map[fn_name](**fn_args)

            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": json.dumps(result, default=str)
            })

    return "Unable to complete request — too many iterations."

Testing Error Paths

Error handling only works if you test it. Use mocking to simulate failures:

Python

import pytest
from unittest.mock import patch, MagicMock
import psycopg2

def test_get_patient_record_db_failure():
    """Tool should return error dict, not raise, when DB is down."""
    with patch("tools.patient.get_db_connection") as mock_conn:
        mock_conn.side_effect = ConnectionError("DB connection refused")

        result = get_patient_record("P-00123")

        assert result["success"] is False
        assert "Database temporarily unavailable" in result["error"]
        assert "hint" in result

def test_get_patient_record_not_found():
    """Tool should return structured not-found error."""
    with patch("tools.patient.get_db_connection") as mock_conn:
        mock_cursor = MagicMock()
        mock_cursor.fetchone.return_value = None
        mock_conn.return_value.execute.return_value = mock_cursor

        result = get_patient_record("P-99999")

        assert result["success"] is False
        assert "not found" in result["error"].lower()
        assert result["patient_id"] == "P-99999"

def test_fetch_drug_retries_on_timeout():
    """Tool should retry up to max_retries times on timeout."""
    call_count = 0

    with patch("tools.drug.httpx.Client") as mock_client_cls:
        def side_effect(*args, **kwargs):
            nonlocal call_count
            call_count += 1
            if call_count < 3:
                raise httpx.TimeoutException("Timeout")
            mock_response = MagicMock()
            mock_response.status_code = 200
            mock_response.json.return_value = {"drugGroup": {"conceptGroup": []}}
            return mock_response

        mock_client = MagicMock()
        mock_client.__enter__ = MagicMock(return_value=mock_client)
        mock_client.__exit__ = MagicMock(return_value=False)
        mock_client.get.side_effect = side_effect
        mock_client_cls.return_value = mock_client

        result = fetch_drug_from_external_api("Metformin", max_retries=3, base_delay=0)
        assert result["success"] is True
        assert result["attempts"] == 3

Common Error Categories and How to Handle Each

| Error Type | Strategy | LLM Hint | |---|---|---| | Not found | Return structured not-found error immediately | "Verify the ID/name and try again" | | Validation failure | Return field-level errors before any I/O | "Correct these fields: ..." | | Network timeout | Retry with backoff, then return error | "Service slow, try again later" | | Rate limit (429) | Exponential backoff retry | "Try again in a few seconds" | | Server error (5xx) | Retry up to 3 times | "Service error, retrying" | | Client error (4xx) | Return immediately, no retry | "Check input parameters" | | DB connection | Return error with IT contact hint | "Contact IT support" | | Permission denied | Return clear access error | "You don't have access to this data" | | Data quality | Return with warning, include partial data | "Data may be incomplete" |

Summary

Tools must never raise exceptions — always return a dict
Use a consistent error format with success, error, and hint fields
Implement retry logic for transient failures (timeout, 5xx) with exponential backoff
Use fallback sources when the primary source is unavailable
The agent loop must append error results, not skip them
Add retry_suggested: True to error dicts when the LLM should retry automatically
Test all error paths with mocking — error handling you haven't tested doesn't work

Handling Tool Errors Gracefully

Why Error Handling Matters in Agent Loops

The Core Pattern: Try/Except in Every Tool

Error Response Format

Retry Logic Inside the Tool

Max Retries + Fallback Behavior

Handling Errors in the Agent Loop

Testing Error Paths

Common Error Categories and How to Handle Each

Summary

Enjoyed this article?

Leave a comment