Learnixo

Agents & Tools Interview Prep · Lesson 3 of 12

How the LLM Decides Which Tool to Call

The Selection Mechanism

The LLM does not run a keyword-matching algorithm to pick tools. It performs full semantic reasoning across the entire conversation context — the system prompt, conversation history, user message, and every tool description — and decides which action (if any) is most appropriate.

In practice, three factors dominate the decision:

  1. Description relevance — Does this tool description match what the user is asking for?
  2. Conversation context — What has been established in earlier turns that makes one tool more appropriate than another?
  3. tool_choice setting — The explicit constraint you set in the API call

The tool_choice Parameter

tool_choice is your primary lever for controlling whether and which tool gets called.

Python
import openai

client = openai.OpenAI()

# Option 1: auto  LLM decides
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="auto"  # Default. Model may or may not call a tool.
)

# Option 2: none  LLM cannot call tools in this turn
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="none"  # Forces a text response. Use for summaries, clarifications.
)

# Option 3: required  LLM must call at least one tool
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="required"  # Use when you always need structured output.
)

# Option 4: specific tool  Force the LLM to call one specific tool
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice={
        "type": "function",
        "function": {"name": "get_drug_info"}  # Must call this exact tool
    }
)

When to use each:

| Setting | Use Case | |---|---| | "auto" | Most agent interactions — let the model reason | | "none" | You want a summary or explanation after tool results are in | | "required" | You need structured JSON output (use tools as structured output) | | specific tool | Data extraction where you always need one particular schema |


How Description Matching Works: A Demo

Python
import json
import openai

client = openai.OpenAI()

# Two tools with overlapping domains but different scopes
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_drug_interactions",
            "description": (
                "Check for known interactions between two or more drugs. "
                "Use this when the user asks whether it is safe to combine medications, "
                "or asks about drug-drug interactions. Do NOT use for general drug information."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "drug_a": {
                        "type": "string",
                        "description": "Name of the first drug."
                    },
                    "drug_b": {
                        "type": "string",
                        "description": "Name of the second drug."
                    }
                },
                "required": ["drug_a", "drug_b"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_drug_dosage",
            "description": (
                "Retrieve the standard dosage and administration guidelines for a single drug. "
                "Use this when the user asks how much of a medication to take, how often, "
                "or how to administer it. Do NOT use for interaction questions."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "drug_name": {
                        "type": "string",
                        "description": "The name of the drug."
                    },
                    "patient_weight_kg": {
                        "type": "number",
                        "description": "Optional. Patient weight for weight-based dosing."
                    }
                },
                "required": ["drug_name"]
            }
        }
    }
]

def check_which_tool_is_called(user_message: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_message}],
        tools=tools,
        tool_choice="auto"
    )
    msg = response.choices[0].message
    if msg.tool_calls:
        tc = msg.tool_calls[0]
        return f"Tool: {tc.function.name} | Args: {tc.function.arguments}"
    return f"No tool called. Response: {msg.content[:100]}"

# Query that clearly maps to search_drug_interactions
print(check_which_tool_is_called("Is it safe to take Metformin and Ibuprofen together?"))
# Tool: search_drug_interactions | Args: {"drug_a": "Metformin", "drug_b": "Ibuprofen"}

# Query that clearly maps to get_drug_dosage
print(check_which_tool_is_called("What's the usual dose of Lisinopril for hypertension?"))
# Tool: get_drug_dosage | Args: {"drug_name": "Lisinopril"}

# Ambiguous query  which tool wins?
print(check_which_tool_is_called("Tell me about Aspirin."))
# Likely: No tool called (or get_drug_dosage if descriptions are clear)

What Happens When Multiple Tools Could Match

When a query could plausibly invoke multiple tools, the LLM picks the one whose description most closely matches the intent. You can influence this by:

1. Making descriptions mutually exclusive:

Python
# Bad: both tools could handle "tell me about Metformin"
"get_drug_info": "Provides information about drugs."
"get_drug_dosage": "Provides information about drug dosing."

# Good: descriptions carve out distinct territory
"get_drug_info": (
    "Returns the clinical profile of a drug: mechanism of action, "
    "approved indications, side effect profile, and pharmacokinetics. "
    "Do NOT use for dosage questions — use get_drug_dosage instead."
)
"get_drug_dosage": (
    "Returns dosage and administration guidelines only: how much, how often, "
    "and how to take a drug. Do NOT use for general drug information — "
    "use get_drug_info instead."
)

2. Cross-referencing other tools in descriptions:

Telling the LLM which tool to use instead is remarkably effective. When the model reads "Do NOT use for X — use tool_Y instead," it treats that as a hard constraint.

3. Using parallel tool calls for genuinely ambiguous multi-part queries:

Python
# User: "What's the dose of Metformin and does it interact with alcohol?"
# This legitimately needs both tools

# The LLM with gpt-4o will often call both in parallel:
# tool_calls: [
#   {function: {name: "get_drug_dosage", arguments: '{"drug_name": "Metformin"}'}},
#   {function: {name: "search_drug_interactions", arguments: '{"drug_a": "Metformin", "drug_b": "Alcohol"}'}}
# ]

Parallel Tool Calls

OpenAI models can return multiple tool calls in a single response. This happens when the LLM determines that several independent pieces of information are needed to answer the query.

Python
import asyncio
import json
import openai

client = openai.OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_drug_dosage",
            "description": "Get dosage information for a drug.",
            "parameters": {
                "type": "object",
                "properties": {
                    "drug_name": {"type": "string", "description": "Drug name."}
                },
                "required": ["drug_name"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_drug_interactions",
            "description": "Check interactions between two drugs.",
            "parameters": {
                "type": "object",
                "properties": {
                    "drug_a": {"type": "string"},
                    "drug_b": {"type": "string"}
                },
                "required": ["drug_a", "drug_b"]
            }
        }
    }
]

def get_drug_dosage(drug_name: str) -> dict:
    return {"drug": drug_name, "dose": "500mg twice daily", "route": "oral"}

def search_drug_interactions(drug_a: str, drug_b: str) -> dict:
    return {
        "drug_a": drug_a,
        "drug_b": drug_b,
        "interaction": "Monitor blood glucose — NSAIDs may impair renal metformin clearance",
        "severity": "moderate"
    }

TOOL_MAP = {
    "get_drug_dosage": get_drug_dosage,
    "search_drug_interactions": search_drug_interactions,
}

def handle_parallel_tool_calls(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )

    msg = response.choices[0].message

    if not msg.tool_calls:
        return msg.content

    print(f"LLM requested {len(msg.tool_calls)} tool call(s)")
    messages.append(msg)

    # Execute all tool calls and collect results
    for tc in msg.tool_calls:
        fn_name = tc.function.name
        fn_args = json.loads(tc.function.arguments)

        print(f"  Executing: {fn_name}({fn_args})")
        result = TOOL_MAP[fn_name](**fn_args)

        messages.append({
            "role": "tool",
            "tool_call_id": tc.id,
            "content": json.dumps(result)
        })

    # All results are now in the message list  get final answer
    final = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )
    return final.choices[0].message.content

# This query needs both tools
result = handle_parallel_tool_calls(
    "What's the dose of Metformin, and does it interact with Ibuprofen?"
)
print(result)

Expected output:

LLM requested 2 tool call(s)
  Executing: get_drug_dosage({'drug_name': 'Metformin'})
  Executing: search_drug_interactions({'drug_a': 'Metformin', 'drug_b': 'Ibuprofen'})

Metformin is typically taken at 500mg twice daily by mouth. Regarding interactions:
there is a moderate interaction between Metformin and Ibuprofen — NSAIDs may impair
renal metformin clearance, so blood glucose should be monitored if both are used.

Handling Ambiguous Queries

Ambiguous queries are the main source of wrong tool selections. Examples:

  • "Tell me about Aspirin" — general info? dosage? interactions? history?
  • "Is Metformin safe?" — safe for whom? safe with what drug? side effects?
  • "Check my prescription" — check what exactly?

Strategy 1: Ask for clarification before calling tools

Add to your system prompt:

Python
system_prompt = """
You are a clinical pharmacist assistant.

If a user's query is ambiguous about what type of drug information they need,
ask one clarifying question before calling any tool. For example:
- If they ask "tell me about [drug]", ask whether they want dosage, interactions, or side effects.
- If they ask "is [drug] safe", ask whether they're asking about general safety or a specific interaction.

Once you understand the intent, use the appropriate tool.
"""

Strategy 2: Default to the most useful tool for your domain

If your app is primarily a dosage checker, have get_drug_dosage as the default and note in its description that it's the primary reference for general drug questions.

Strategy 3: Use a routing tool

Add a meta-tool that classifies intent before calling a specialized tool:

Python
tools = [
    {
        "type": "function",
        "function": {
            "name": "classify_drug_query",
            "description": (
                "Use this first for any drug-related question. "
                "Classifies the user's intent so the right specialized tool can be called next."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "intent": {
                        "type": "string",
                        "enum": ["dosage", "interaction", "side_effects", "mechanism", "other"],
                        "description": "The primary intent of the user's drug question."
                    },
                    "drugs_mentioned": {
                        "type": "array",
                        "items": {"type": "string"},
                        "description": "All drug names mentioned by the user."
                    }
                },
                "required": ["intent", "drugs_mentioned"]
            }
        }
    }
    # ... other specialized tools
]

Context Influences Tool Selection

Earlier messages in the conversation affect which tool the LLM picks. This is important for multi-turn agents.

Python
messages = [
    {"role": "system", "content": "You are a pharmacy assistant."},
    {"role": "user", "content": "I was just prescribed Metformin."},
    {"role": "assistant", "content": "Congratulations on starting Metformin! It's a first-line treatment for type 2 diabetes. Do you have any questions about it?"},
    # The next message  context establishes we're discussing Metformin
    {"role": "user", "content": "What should I watch out for?"}
    # LLM now knows "what should I watch out for?" means Metformin side effects/interactions
    # It will call the drug info tool with drug_name="Metformin", not ask for clarification
]

This is both a feature and a risk. If a user introduces a new drug mid-conversation, make sure your conversation management resets or correctly propagates context.


Key Takeaways

  • The LLM selects tools through semantic reasoning, not keyword matching
  • tool_choice="auto" is correct for most use cases; use "required" or specific tool when you always need structure
  • Write descriptions that explicitly say what each tool does NOT handle and refer to sibling tools
  • Parallel tool calls happen automatically when the query needs multiple independent answers
  • Ambiguous queries are best handled by prompting the LLM to ask a clarifying question before tool use
  • Conversation context influences tool selection — design multi-turn flows with this in mind