Returning Tool Results to the LLM

The Message Flow

Tool calling introduces a four-message sequence into your conversation:

1. User message          role: "user"
2. Assistant tool call   role: "assistant"  (contains tool_calls)
3. Tool result           role: "tool"       (contains tool result)
4. Assistant answer      role: "assistant"  (final text response)

The LLM sees all four messages when generating the final answer. Get the format of message 3 wrong and the model either ignores the result, hallucinates, or produces an API error.

The role: "tool" Message Format

Python

{
    "role": "tool",
    "tool_call_id": "call_abc123",  # Must match the id from the tool_call request
    "content": json.dumps(result)   # Always a string — serialize to JSON
}

Three things matter:

role must be exactly "tool" — not "function", not "system", not "user"
tool_call_id must match the id field on the tool_call object from the assistant message
content must be a string — serialize your result dict with json.dumps()

The tool_call_id is how the LLM knows which tool call produced which result. This matters especially for parallel tool calls where multiple results come back in sequence.

Minimal Correct Implementation

Python

import json
import openai

client = openai.OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_patient_record",
            "description": "Retrieve a patient's medical record by patient ID.",
            "parameters": {
                "type": "object",
                "properties": {
                    "patient_id": {
                        "type": "string",
                        "description": "The patient's unique identifier, e.g. 'P-00123'."
                    }
                },
                "required": ["patient_id"]
            }
        }
    }
]

def get_patient_record(patient_id: str) -> dict:
    """Mock patient record lookup."""
    records = {
        "P-00123": {
            "name": "Jane Doe",
            "dob": "1975-03-14",
            "conditions": ["Type 2 Diabetes", "Hypertension"],
            "current_medications": ["Metformin 500mg", "Lisinopril 10mg"],
            "allergies": ["Penicillin"]
        }
    }
    record = records.get(patient_id)
    if not record:
        return {"error": f"Patient {patient_id} not found"}
    return record

def run_agent(user_message: str) -> str:
    messages = [
        {
            "role": "system",
            "content": "You are a clinical assistant. Use tools to look up patient records."
        },
        {"role": "user", "content": user_message}
    ]

    # First LLM call
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )

    assistant_message = response.choices[0].message

    # If no tool call, return the direct answer
    if not assistant_message.tool_calls:
        return assistant_message.content

    # Append the assistant's message (containing the tool_call request)
    messages.append(assistant_message)

    # Execute each tool call and append the result
    for tool_call in assistant_message.tool_calls:
        fn_name = tool_call.function.name
        fn_args = json.loads(tool_call.function.arguments)

        # Execute
        if fn_name == "get_patient_record":
            result = get_patient_record(**fn_args)
        else:
            result = {"error": f"Unknown tool: {fn_name}"}

        # Append the tool result message
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,   # Critical: must match
            "content": json.dumps(result)    # Must be a string
        })

    # Second LLM call — now the model has the real data
    final_response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )

    return final_response.choices[0].message.content

print(run_agent("What medications is patient P-00123 currently taking?"))

Formatting Tool Results: JSON vs Plain Text vs Structured Error

JSON (preferred for structured data)

Python

# Good — the LLM can parse and reason about structured fields
result = {
    "patient_id": "P-00123",
    "medications": ["Metformin 500mg", "Lisinopril 10mg"],
    "last_updated": "2026-05-01"
}
content = json.dumps(result)

Plain Text (acceptable for simple results)

Python

# Acceptable for simple scalar results
result = "Patient P-00123: Jane Doe, DOB 1975-03-14"
content = result  # Already a string

Structured Error

Python

# Errors should be structured too — the LLM reads them
result = {
    "error": "Patient not found",
    "patient_id": "P-99999",
    "suggestion": "Verify the patient ID and try again"
}
content = json.dumps(result)

The LLM will incorporate error information into its response — e.g., "I couldn't find patient P-99999 in the system. Could you double-check the ID?"

Handling Large Tool Results

LLMs have context limits. A tool that returns a 50,000-token database dump will either fail or crowd out useful context. Three strategies:

Strategy 1: Truncate at the tool level

Python

def search_medical_literature(query: str, max_chars: int = 4000) -> dict:
    """Search and return truncated results."""
    results = database.search(query)
    full_text = format_results(results)

    if len(full_text) > max_chars:
        truncated = full_text[:max_chars]
        return {
            "results": truncated,
            "truncated": True,
            "total_results": len(results),
            "returned_chars": max_chars,
            "note": "Results truncated. Ask for a more specific query for complete data."
        }

    return {"results": full_text, "truncated": False, "total_results": len(results)}

Strategy 2: Paginate

Python

def get_patient_history(
    patient_id: str,
    page: int = 1,
    page_size: int = 10
) -> dict:
    """Paginated patient history."""
    all_events = database.get_events(patient_id)
    total = len(all_events)
    start = (page - 1) * page_size
    end = start + page_size

    return {
        "patient_id": patient_id,
        "page": page,
        "page_size": page_size,
        "total_events": total,
        "total_pages": (total + page_size - 1) // page_size,
        "events": all_events[start:end],
        "has_more": end < total
    }

The LLM can call get_patient_history with page=2 on the next turn if it needs more data.

Strategy 3: Summarize inside the tool

Python

import openai

summarizer = openai.OpenAI()

def get_research_summary(topic: str) -> dict:
    """Fetch research papers and return an LLM-generated summary."""
    raw_papers = fetch_papers_from_pubmed(topic, limit=20)
    full_text = "\n\n".join(p["abstract"] for p in raw_papers)

    # Use a separate, cheap LLM call to summarize before returning
    summary_response = summarizer.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "user",
                "content": f"Summarize these research abstracts in under 500 words:\n\n{full_text}"
            }
        ],
        max_tokens=600
    )

    return {
        "topic": topic,
        "papers_found": len(raw_papers),
        "summary": summary_response.choices[0].message.content
    }

The Complete Multi-Turn Agent Loop

A robust agent handles multiple rounds of tool calls — the LLM may call a tool, receive the result, and then call another tool before giving a final answer.

Python

import json
import openai
from typing import Callable

client = openai.OpenAI()

def run_agent_loop(
    user_message: str,
    tools: list,
    tool_map: dict[str, Callable],
    system_prompt: str = "You are a helpful assistant.",
    max_iterations: int = 10
) -> str:
    """
    General-purpose agentic loop.

    Continues calling the LLM until:
    - It returns a text response (no tool calls)
    - max_iterations is reached
    """
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_message}
    ]

    for iteration in range(max_iterations):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )

        msg = response.choices[0].message

        # If no tool calls, the LLM is done — return the answer
        if not msg.tool_calls:
            return msg.content or ""

        # Append the assistant message with tool_calls
        messages.append(msg)

        # Execute all tool calls in this batch
        for tool_call in msg.tool_calls:
            fn_name = tool_call.function.name
            fn_args = json.loads(tool_call.function.arguments)

            print(f"[Iteration {iteration + 1}] Calling {fn_name}({fn_args})")

            if fn_name in tool_map:
                try:
                    result = tool_map[fn_name](**fn_args)
                except Exception as e:
                    result = {"error": str(e), "tool": fn_name}
            else:
                result = {"error": f"Unknown tool: {fn_name}"}

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result)
            })

    # If we hit max_iterations, return a fallback
    return "I was unable to complete the request within the allowed number of steps."


# Example usage with drug lookup agent
drug_tools = [
    {
        "type": "function",
        "function": {
            "name": "get_drug_info",
            "description": "Get drug information including dosage and interactions.",
            "parameters": {
                "type": "object",
                "properties": {
                    "drug_name": {"type": "string", "description": "Drug name."},
                    "info_type": {
                        "type": "string",
                        "enum": ["dosage", "interactions", "all"],
                        "description": "Type of information needed."
                    }
                },
                "required": ["drug_name", "info_type"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_patient_allergies",
            "description": "Get a patient's known drug allergies.",
            "parameters": {
                "type": "object",
                "properties": {
                    "patient_id": {"type": "string", "description": "Patient ID."}
                },
                "required": ["patient_id"]
            }
        }
    }
]

def get_drug_info(drug_name: str, info_type: str) -> dict:
    return {
        "drug": drug_name,
        "dosage": "10mg once daily",
        "interactions": ["Warfarin — increased bleeding risk"]
    }

def get_patient_allergies(patient_id: str) -> dict:
    return {
        "patient_id": patient_id,
        "allergies": ["Penicillin", "Sulfonamides"]
    }

tool_map = {
    "get_drug_info": get_drug_info,
    "get_patient_allergies": get_patient_allergies
}

answer = run_agent_loop(
    user_message="Is Atorvastatin safe for patient P-00123? Check their allergies first.",
    tools=drug_tools,
    tool_map=tool_map,
    system_prompt=(
        "You are a clinical safety assistant. "
        "Always check patient allergies before confirming drug safety."
    )
)
print(answer)

Common Mistakes and How to Fix Them

Mistake: Not appending the assistant message before the tool result

Python

# Wrong — missing the assistant message
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": json.dumps(result)
})

# Correct — assistant message comes first
messages.append(assistant_message)  # This must come before tool results
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": json.dumps(result)
})

Mistake: Passing a dict instead of a string as content

Python

# Wrong — content must be a string
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": result  # This is a dict — will cause an API error
})

# Correct
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": json.dumps(result)  # Serialize to string
})

Mistake: Using a hardcoded or wrong tool_call_id

Python

# Wrong — hardcoded ID
messages.append({
    "role": "tool",
    "tool_call_id": "my_fixed_id",  # Won't match the actual tool_call.id
    "content": json.dumps(result)
})

# Correct — always use the id from the tool_call object
for tool_call in msg.tool_calls:
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,  # From the response object
        "content": json.dumps(execute_tool(tool_call))
    })

Summary

| Step | What To Do | |---|---| | Receive tool call | Read msg.tool_calls — each has .id, .function.name, .function.arguments | | Append assistant msg | Add msg to messages before any tool results | | Execute tool | Call your function with json.loads(tc.function.arguments) | | Return result | Append {"role": "tool", "tool_call_id": tc.id, "content": json.dumps(result)} | | Get final answer | Call the LLM again with the full updated message list | | Large results | Truncate, paginate, or pre-summarize inside the tool function |

Returning Tool Results to the LLM

The Message Flow

The role: "tool" Message Format

Minimal Correct Implementation

Formatting Tool Results: JSON vs Plain Text vs Structured Error

JSON (preferred for structured data)

Plain Text (acceptable for simple results)

Structured Error

Handling Large Tool Results

Strategy 1: Truncate at the tool level

Strategy 2: Paginate

Strategy 3: Summarize inside the tool

The Complete Multi-Turn Agent Loop

Common Mistakes and How to Fix Them

Mistake: Not appending the assistant message before the tool result

Mistake: Passing a dict instead of a string as content

Mistake: Using a hardcoded or wrong tool_call_id

Summary

Enjoyed this article?

Leave a comment