Learnixo
Back to blog
AI Systemsintermediate

What Is Tool Calling?

Understand how LLMs decide to invoke functions instead of generating text, and why tool calling is the foundation of every useful AI agent.

Asma Hafeez KhanMay 15, 20268 min read
Tool CallingFunction CallingAI AgentsOpenAIPython
Share:𝕏

The Problem with Pure Text Generation

A raw language model has one job: predict the next token. That's it. Ask it for today's weather and it will confidently generate a plausible-sounding answer — one that may be completely wrong, because it has no way to actually check.

This matters enormously in production. An agent that fabricates drug dosages, stock prices, or appointment availability is not just unhelpful — it's dangerous. Tool calling solves this by giving the LLM a way to request real information and take real actions before it generates a final answer.


What Tool Calling Actually Is

Tool calling (also called function calling) is a protocol where:

  1. You describe a set of functions — their names, what they do, what arguments they accept
  2. The LLM reads those descriptions as part of its context
  3. Instead of generating prose, the LLM may choose to emit a structured tool call request — a JSON blob saying "call this function with these arguments"
  4. Your code receives that request, executes the actual function, and sends the result back to the LLM
  5. The LLM uses the real result to produce its final answer

The LLM never executes code. It only decides whether to call a tool and what arguments to pass. Your application executes the actual logic.

User query
    │
    ▼
┌─────────────────────────────┐
│   LLM receives:             │
│   - System prompt           │
│   - User message            │
│   - Tool schema definitions │
└────────────┬────────────────┘
             │
             ▼
    Does the query need a tool?
             │
      ┌──────┴──────┐
      │ Yes         │ No
      ▼             ▼
 Tool call       Final text
 request         response
      │
      ▼
Your code executes the function
      │
      ▼
Result sent back to LLM
      │
      ▼
LLM generates final answer

A Concrete Example: Weather Tool

Without tool calling, this conversation goes wrong:

User: What's the weather in Oslo right now?
LLM:  It's currently 12°C and partly cloudy in Oslo.  ← fabricated

With tool calling, the LLM instead requests real data:

User: What's the weather in Oslo right now?
LLM:  [tool_call: get_current_weather(location="Oslo")]
App:  [executes API call, returns {"temp": 8, "condition": "rainy"}]
LLM:  Right now in Oslo it's 8°C and raining.  ← grounded in real data

The Full Flow in Python (OpenAI)

Python
import json
import openai

client = openai.OpenAI()

# Step 1: Define the tool schema
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather for a city. Use this whenever the user asks about current weather conditions.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. 'Oslo' or 'New York'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit. Default to celsius."
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Step 2: The actual function your code executes
def get_current_weather(location: str, unit: str = "celsius") -> dict:
    # In production, this calls a real weather API
    # For demonstration, returning mock data
    mock_data = {
        "Oslo": {"temp": 8, "condition": "rainy", "humidity": 82},
        "London": {"temp": 14, "condition": "cloudy", "humidity": 75},
    }
    weather = mock_data.get(location, {"temp": 20, "condition": "sunny", "humidity": 50})
    return {
        "location": location,
        "temperature": weather["temp"],
        "unit": unit,
        "condition": weather["condition"],
        "humidity": weather["humidity"]
    }

# Step 3: Send user query with tools
messages = [
    {"role": "user", "content": "What's the weather in Oslo right now?"}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="auto"  # Let the LLM decide whether to call a tool
)

# Step 4: Check if the LLM requested a tool call
message = response.choices[0].message

if message.tool_calls:
    tool_call = message.tool_calls[0]
    function_name = tool_call.function.name
    function_args = json.loads(tool_call.function.arguments)

    print(f"LLM requested tool: {function_name}")
    print(f"With arguments: {function_args}")

    # Step 5: Execute the actual function
    if function_name == "get_current_weather":
        result = get_current_weather(**function_args)

    # Step 6: Append both the LLM's tool call and our result to the conversation
    messages.append(message)  # The LLM's message containing the tool_call
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(result)
    })

    # Step 7: Send back to LLM for the final answer
    final_response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )
    print(final_response.choices[0].message.content)
else:
    # LLM answered directly without needing a tool
    print(message.content)

Output:

LLM requested tool: get_current_weather
With arguments: {'location': 'Oslo', 'unit': 'celsius'}
Right now in Oslo it's 8°C and rainy, with humidity at 82%.

Calculator Tool Example

Another classic case: math. LLMs can handle simple arithmetic, but they make errors on multi-step or floating-point calculations. A calculator tool eliminates that entire failure mode.

Python
import math

tools = [
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Perform mathematical calculations. Use this for any arithmetic, algebra, or mathematical computation the user requests. More reliable than computing in your head.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "A valid Python math expression to evaluate, e.g. '2 ** 32' or 'math.sqrt(144)'"
                    }
                },
                "required": ["expression"]
            }
        }
    }
]

def calculate(expression: str) -> dict:
    """Safely evaluate a mathematical expression."""
    try:
        # Restrict to safe math operations
        allowed_names = {
            name: getattr(math, name)
            for name in dir(math)
            if not name.startswith("_")
        }
        allowed_names["abs"] = abs
        allowed_names["round"] = round

        result = eval(expression, {"__builtins__": {}}, allowed_names)
        return {"result": result, "expression": expression}
    except Exception as e:
        return {"error": str(e), "expression": expression}

# Test: What is 2 to the power of 32?
result = calculate("2 ** 32")
print(result)  # {'result': 4294967296, 'expression': '2 ** 32'}

The Calculator Tool in a Full Conversation

Python
import json
import openai

client = openai.OpenAI()

def run_conversation(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )

    msg = response.choices[0].message

    if not msg.tool_calls:
        return msg.content

    # Handle the tool call
    messages.append(msg)

    for tool_call in msg.tool_calls:
        args = json.loads(tool_call.function.arguments)
        result = calculate(**args)
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(result)
        })

    final = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )
    return final.choices[0].message.content

print(run_conversation("What is the square root of 17161?"))
# Output: The square root of 17161 is 131.

Why Tool Calling Matters for AI Agents

Without tool calling, an LLM agent can only:

  • Summarize information it was trained on
  • Reason through problems in text
  • Generate content

With tool calling, an LLM agent can:

  • Read live data — databases, APIs, file systems
  • Write data — create records, send emails, update state
  • Execute code — run computations, process files
  • Chain actions — the result of one tool informs the next tool call
  • Operate autonomously — loop through a task without human input at each step

This is the difference between a chatbot and an agent. A chatbot talks. An agent acts.


Key Concepts to Remember

| Concept | What It Means | |---|---| | Tool schema | JSON Schema description of a function — name, description, parameters | | tool_choice: "auto" | LLM decides whether to call a tool | | tool_choice: "required" | LLM must call at least one tool | | tool_call_id | Unique ID linking a tool call request to its result | | role: "tool" | The message role for returning a tool result to the LLM | | Parallel tool calls | LLM requests multiple tools in one response |


Common Mistakes

Mistake 1: Executing the tool yourself before checking if the LLM requested it. Always check message.tool_calls before running any function. The LLM may have answered directly.

Mistake 2: Forgetting to append both the assistant message and the tool result. The LLM needs the full conversation history — its own tool_call message AND your tool result — to generate a coherent final answer.

Mistake 3: Not handling the case where the LLM calls a tool you haven't implemented. Always map function names to actual callables and handle the case where the name doesn't match.

Python
TOOL_MAP = {
    "get_current_weather": get_current_weather,
    "calculate": calculate,
}

for tool_call in msg.tool_calls:
    fn_name = tool_call.function.name
    if fn_name not in TOOL_MAP:
        result = {"error": f"Unknown tool: {fn_name}"}
    else:
        args = json.loads(tool_call.function.arguments)
        try:
            result = TOOL_MAP[fn_name](**args)
        except Exception as e:
            result = {"error": str(e)}

    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(result)
    })

Summary

Tool calling gives LLMs the ability to act on the world rather than just talk about it. The flow is:

  1. Define tools as JSON schemas with clear descriptions
  2. Send those schemas along with the user's message
  3. Check if the LLM responded with a tool call request
  4. Execute the actual function in your code
  5. Return the result to the LLM in a role: "tool" message
  6. Get the final grounded answer

This six-step loop is the foundation of every production AI agent.

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.