What Is Tool Calling?

The Problem with Pure Text Generation

A raw language model has one job: predict the next token. That's it. Ask it for today's weather and it will confidently generate a plausible-sounding answer — one that may be completely wrong, because it has no way to actually check.

This matters enormously in production. An agent that fabricates drug dosages, stock prices, or appointment availability is not just unhelpful — it's dangerous. Tool calling solves this by giving the LLM a way to request real information and take real actions before it generates a final answer.

What Tool Calling Actually Is

Tool calling (also called function calling) is a protocol where:

You describe a set of functions — their names, what they do, what arguments they accept
The LLM reads those descriptions as part of its context
Instead of generating prose, the LLM may choose to emit a structured tool call request — a JSON blob saying "call this function with these arguments"
Your code receives that request, executes the actual function, and sends the result back to the LLM
The LLM uses the real result to produce its final answer

The LLM never executes code. It only decides whether to call a tool and what arguments to pass. Your application executes the actual logic.

User query
    │
    ▼
┌─────────────────────────────┐
│   LLM receives:             │
│   - System prompt           │
│   - User message            │
│   - Tool schema definitions │
└────────────┬────────────────┘
             │
             ▼
    Does the query need a tool?
             │
      ┌──────┴──────┐
      │ Yes         │ No
      ▼             ▼
 Tool call       Final text
 request         response
      │
      ▼
Your code executes the function
      │
      ▼
Result sent back to LLM
      │
      ▼
LLM generates final answer

A Concrete Example: Weather Tool

Without tool calling, this conversation goes wrong:

User: What's the weather in Oslo right now?
LLM:  It's currently 12°C and partly cloudy in Oslo.  ← fabricated

With tool calling, the LLM instead requests real data:

User: What's the weather in Oslo right now?
LLM:  [tool_call: get_current_weather(location="Oslo")]
App:  [executes API call, returns {"temp": 8, "condition": "rainy"}]
LLM:  Right now in Oslo it's 8°C and raining.  ← grounded in real data

The Full Flow in Python (OpenAI)

Python

import json
import openai

client = openai.OpenAI()

# Step 1: Define the tool schema
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather for a city. Use this whenever the user asks about current weather conditions.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. 'Oslo' or 'New York'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit. Default to celsius."
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Step 2: The actual function your code executes
def get_current_weather(location: str, unit: str = "celsius") -> dict:
    # In production, this calls a real weather API
    # For demonstration, returning mock data
    mock_data = {
        "Oslo": {"temp": 8, "condition": "rainy", "humidity": 82},
        "London": {"temp": 14, "condition": "cloudy", "humidity": 75},
    }
    weather = mock_data.get(location, {"temp": 20, "condition": "sunny", "humidity": 50})
    return {
        "location": location,
        "temperature": weather["temp"],
        "unit": unit,
        "condition": weather["condition"],
        "humidity": weather["humidity"]
    }

# Step 3: Send user query with tools
messages = [
    {"role": "user", "content": "What's the weather in Oslo right now?"}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="auto"  # Let the LLM decide whether to call a tool
)

# Step 4: Check if the LLM requested a tool call
message = response.choices[0].message

if message.tool_calls:
    tool_call = message.tool_calls[0]
    function_name = tool_call.function.name
    function_args = json.loads(tool_call.function.arguments)

    print(f"LLM requested tool: {function_name}")
    print(f"With arguments: {function_args}")

    # Step 5: Execute the actual function
    if function_name == "get_current_weather":
        result = get_current_weather(**function_args)

    # Step 6: Append both the LLM's tool call and our result to the conversation
    messages.append(message)  # The LLM's message containing the tool_call
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(result)
    })

    # Step 7: Send back to LLM for the final answer
    final_response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )
    print(final_response.choices[0].message.content)
else:
    # LLM answered directly without needing a tool
    print(message.content)

Output:

LLM requested tool: get_current_weather
With arguments: {'location': 'Oslo', 'unit': 'celsius'}
Right now in Oslo it's 8°C and rainy, with humidity at 82%.

Calculator Tool Example

Another classic case: math. LLMs can handle simple arithmetic, but they make errors on multi-step or floating-point calculations. A calculator tool eliminates that entire failure mode.

Python

import math

tools = [
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Perform mathematical calculations. Use this for any arithmetic, algebra, or mathematical computation the user requests. More reliable than computing in your head.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "A valid Python math expression to evaluate, e.g. '2 ** 32' or 'math.sqrt(144)'"
                    }
                },
                "required": ["expression"]
            }
        }
    }
]

def calculate(expression: str) -> dict:
    """Safely evaluate a mathematical expression."""
    try:
        # Restrict to safe math operations
        allowed_names = {
            name: getattr(math, name)
            for name in dir(math)
            if not name.startswith("_")
        }
        allowed_names["abs"] = abs
        allowed_names["round"] = round

        result = eval(expression, {"__builtins__": {}}, allowed_names)
        return {"result": result, "expression": expression}
    except Exception as e:
        return {"error": str(e), "expression": expression}

# Test: What is 2 to the power of 32?
result = calculate("2 ** 32")
print(result)  # {'result': 4294967296, 'expression': '2 ** 32'}

The Calculator Tool in a Full Conversation

Python

import json
import openai

client = openai.OpenAI()

def run_conversation(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )

    msg = response.choices[0].message

    if not msg.tool_calls:
        return msg.content

    # Handle the tool call
    messages.append(msg)

    for tool_call in msg.tool_calls:
        args = json.loads(tool_call.function.arguments)
        result = calculate(**args)
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(result)
        })

    final = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )
    return final.choices[0].message.content

print(run_conversation("What is the square root of 17161?"))
# Output: The square root of 17161 is 131.

Why Tool Calling Matters for AI Agents

Without tool calling, an LLM agent can only:

Summarize information it was trained on
Reason through problems in text
Generate content

With tool calling, an LLM agent can:

Read live data — databases, APIs, file systems
Write data — create records, send emails, update state
Execute code — run computations, process files
Chain actions — the result of one tool informs the next tool call
Operate autonomously — loop through a task without human input at each step

This is the difference between a chatbot and an agent. A chatbot talks. An agent acts.

Key Concepts to Remember

| Concept | What It Means | |---|---| | Tool schema | JSON Schema description of a function — name, description, parameters | | tool_choice: "auto" | LLM decides whether to call a tool | | tool_choice: "required" | LLM must call at least one tool | | tool_call_id | Unique ID linking a tool call request to its result | | role: "tool" | The message role for returning a tool result to the LLM | | Parallel tool calls | LLM requests multiple tools in one response |

Common Mistakes

Mistake 1: Executing the tool yourself before checking if the LLM requested it. Always check message.tool_calls before running any function. The LLM may have answered directly.

Mistake 2: Forgetting to append both the assistant message and the tool result. The LLM needs the full conversation history — its own tool_call message AND your tool result — to generate a coherent final answer.

Mistake 3: Not handling the case where the LLM calls a tool you haven't implemented. Always map function names to actual callables and handle the case where the name doesn't match.

Python

TOOL_MAP = {
    "get_current_weather": get_current_weather,
    "calculate": calculate,
}

for tool_call in msg.tool_calls:
    fn_name = tool_call.function.name
    if fn_name not in TOOL_MAP:
        result = {"error": f"Unknown tool: {fn_name}"}
    else:
        args = json.loads(tool_call.function.arguments)
        try:
            result = TOOL_MAP[fn_name](**args)
        except Exception as e:
            result = {"error": str(e)}

    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(result)
    })

Summary

Tool calling gives LLMs the ability to act on the world rather than just talk about it. The flow is:

Define tools as JSON schemas with clear descriptions
Send those schemas along with the user's message
Check if the LLM responded with a tool call request
Execute the actual function in your code
Return the result to the LLM in a role: "tool" message
Get the final grounded answer

This six-step loop is the foundation of every production AI agent.

What Is Tool Calling?

The Problem with Pure Text Generation

What Tool Calling Actually Is

A Concrete Example: Weather Tool

The Full Flow in Python (OpenAI)

Calculator Tool Example

The Calculator Tool in a Full Conversation

Why Tool Calling Matters for AI Agents

Key Concepts to Remember

Common Mistakes

Summary

Enjoyed this article?

Leave a comment