Learnixo
Back to blog
AI Systemsadvanced

Building Agents with LLMs

How to build reliable LLM agents: the ReAct pattern, tool loops, memory systems, multi-agent orchestration, and production agent architecture patterns.

Asma Hafeez KhanMay 16, 20268 min read
LLMAgentsReActTool UseMulti-AgentOrchestration
Share:š•

What Makes an Agent

An LLM agent is any system where an LLM takes actions in a loop, where the results of actions feed back into subsequent LLM calls. The key elements:

  • Perception: The LLM receives state information (conversation, tool results, memory)
  • Reasoning: The LLM decides what to do next (chain-of-thought, ReAct, etc.)
  • Action: The LLM calls a tool, sends a message, or produces output
  • Feedback: Tool results or environment state updates the LLM's context

The loop runs until a stopping condition: the LLM produces a final answer, reaches a maximum step count, or an error occurs.


The ReAct Pattern

ReAct (Reason + Act) is the foundational agent pattern:

Python
from openai import OpenAI
import json
import re

client = OpenAI()

REACT_SYSTEM = """You are a clinical pharmacist assistant. Solve problems step by step.

For each step, use this format:
Thought: [reasoning about what to do next]
Action: [tool_name({"param": "value"})]

Or when you have the final answer:
Thought: [final reasoning]
Final Answer: [your answer]

Available tools:
- lookup_interaction(drug_a, drug_b): Check drug interaction severity and management
- get_dosing(drug, egfr): Get renal dose adjustment for a drug
- search_guidelines(query): Search clinical guidelines database"""

def run_react_agent(question: str, tools: dict, max_steps: int = 10) -> str:
    """Execute a ReAct loop to answer a clinical question."""
    messages = [
        {"role": "system", "content": REACT_SYSTEM},
        {"role": "user", "content": question},
    ]

    for step in range(max_steps):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            temperature=0,
            stop=["Observation:"],  # Stop before generating fake observations
        )
        model_output = response.choices[0].message.content
        messages.append({"role": "assistant", "content": model_output})

        print(f"\nStep {step + 1}:\n{model_output}")

        # Check for final answer
        if "Final Answer:" in model_output:
            final = re.search(r"Final Answer:\s*(.+?)$", model_output, re.DOTALL)
            return final.group(1).strip() if final else model_output

        # Parse action
        action_match = re.search(r"Action:\s*(\w+)\(({[^}]+})\)", model_output)
        if not action_match:
            # No valid action — ask model to try again
            messages.append({
                "role": "user",
                "content": "Please use the Action: format to call a tool or provide a Final Answer:",
            })
            continue

        tool_name = action_match.group(1)
        tool_args = json.loads(action_match.group(2))

        # Execute tool
        if tool_name in tools:
            try:
                result = tools[tool_name](**tool_args)
                observation = f"Observation: {json.dumps(result)}"
            except Exception as e:
                observation = f"Observation: Error - {str(e)}"
        else:
            observation = f"Observation: Tool '{tool_name}' not found"

        print(observation)
        messages.append({"role": "user", "content": observation})

    return "Max steps reached without a final answer."

Memory Systems for Long-Running Agents

Agents handling ongoing tasks need persistent memory:

Python
from dataclasses import dataclass, field
from typing import Any
import json
from datetime import datetime

@dataclass
class AgentMemory:
    """
    Multi-level memory system for a persistent agent.
    """
    # Working memory: current task context (fits in context window)
    working_memory: list[dict] = field(default_factory=list)

    # Episodic memory: important events from past sessions
    episodic_memory: list[dict] = field(default_factory=list)

    # Semantic memory: learned facts (external store)
    semantic_store: dict[str, Any] = field(default_factory=dict)

    def add_to_working(self, content: str, role: str = "system") -> None:
        self.working_memory.append({
            "role": role,
            "content": content,
            "timestamp": datetime.now().isoformat(),
        })
        # Trim working memory if it gets too long
        if len(self.working_memory) > 50:
            self._compress_working_memory()

    def _compress_working_memory(self) -> None:
        """Summarize old messages to free context window space."""
        old_messages = self.working_memory[:-20]  # Keep recent 20
        old_text = "\n".join([f"{m['role']}: {m['content']}" for m in old_messages])

        summary = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "user",
                "content": f"Summarize these agent conversation turns in 3-5 bullet points, preserving important decisions and facts:\n\n{old_text}",
            }],
        ).choices[0].message.content

        self.episodic_memory.append({
            "summary": summary,
            "timestamp": datetime.now().isoformat(),
        })
        self.working_memory = self.working_memory[-20:]

    def learn(self, key: str, value: Any) -> None:
        """Store a fact in semantic memory."""
        self.semantic_store[key] = value

    def recall(self, key: str) -> Any:
        return self.semantic_store.get(key)

    def get_context_for_prompt(self) -> str:
        """Build memory context for including in LLM prompt."""
        parts = []

        if self.episodic_memory:
            parts.append("PREVIOUS SESSION SUMMARY:")
            parts.append(self.episodic_memory[-1]["summary"])

        if self.semantic_store:
            parts.append("\nLEARNED FACTS:")
            for k, v in list(self.semantic_store.items())[-5:]:  # Last 5 facts
                parts.append(f"- {k}: {v}")

        return "\n".join(parts)

Multi-Agent Patterns

Supervisor Pattern

A supervisor agent delegates to specialized sub-agents:

Python
from openai import OpenAI

client = OpenAI()

SUPERVISOR_PROMPT = """You are a clinical AI supervisor. Route incoming questions to the appropriate specialist.

Available specialists:
- drug_interaction_specialist: Questions about drug-drug interactions
- dosing_specialist: Questions about dose adjustments (renal, hepatic)
- patient_counseling_specialist: Questions about patient education and counseling
- drug_information_specialist: General drug information, mechanism, pharmacokinetics

For each question, respond with:
ROUTE_TO: <specialist_name>
REASON: <why this specialist>"""

SPECIALIST_PROMPTS = {
    "drug_interaction_specialist": "You are an expert in drug-drug interactions. Use Lexicomp/Micromedex classification.",
    "dosing_specialist": "You are a pharmacist specializing in renal and hepatic dose adjustments.",
    "patient_counseling_specialist": "You are a patient education specialist. Use plain language (6th grade reading level).",
    "drug_information_specialist": "You are a drug information pharmacist. Provide comprehensive, evidence-based information.",
}

def supervisor_agent(question: str) -> dict:
    """Route question to appropriate specialist."""
    # Step 1: Supervisor decides routing
    routing_response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": SUPERVISOR_PROMPT},
            {"role": "user", "content": question},
        ],
        temperature=0,
    ).choices[0].message.content

    # Parse routing decision
    import re
    route_match = re.search(r"ROUTE_TO:\s*(\w+)", routing_response)
    specialist = route_match.group(1) if route_match else "drug_information_specialist"

    # Step 2: Specialist answers
    if specialist not in SPECIALIST_PROMPTS:
        specialist = "drug_information_specialist"

    specialist_response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": SPECIALIST_PROMPTS[specialist]},
            {"role": "user", "content": question},
        ],
        temperature=0,
    ).choices[0].message.content

    return {
        "specialist": specialist,
        "routing_reason": routing_response,
        "answer": specialist_response,
    }

Critic Pattern

One agent generates, another critiques and improves:

Python
def generate_with_critic(
    task: str,
    n_revisions: int = 2,
) -> str:
    """Generate a response and improve it through self-critique."""

    GENERATOR_PROMPT = "You are a clinical pharmacist. Provide accurate, helpful answers."
    CRITIC_PROMPT = """You are a clinical pharmacology reviewer. 
    
Identify issues with this response:
1. Clinical accuracy errors
2. Missing important information
3. Safety concerns
4. Unclear or poorly structured content

Be specific about what to improve."""

    # Initial generation
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": GENERATOR_PROMPT},
            {"role": "user", "content": task},
        ],
        temperature=0.3,
    ).choices[0].message.content

    for revision in range(n_revisions):
        # Critique
        critique = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": CRITIC_PROMPT},
                {"role": "user", "content": f"Original question: {task}\n\nResponse to review:\n{response}"},
            ],
            temperature=0,
        ).choices[0].message.content

        # Revise
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": GENERATOR_PROMPT},
                {"role": "user", "content": task},
                {"role": "assistant", "content": response},
                {"role": "user", "content": f"A reviewer identified these issues:\n{critique}\n\nPlease revise to address them."},
            ],
            temperature=0,
        ).choices[0].message.content

    return response

Reliability: Handling Failures

Python
import time
from functools import wraps
from typing import Callable, TypeVar

T = TypeVar("T")

def with_retry(max_attempts: int = 3, base_delay: float = 1.0):
    """Decorator for retrying LLM calls with exponential backoff."""
    def decorator(fn: Callable[..., T]) -> Callable[..., T]:
        @wraps(fn)
        def wrapper(*args, **kwargs) -> T:
            last_error = None
            for attempt in range(max_attempts):
                try:
                    return fn(*args, **kwargs)
                except Exception as e:
                    last_error = e
                    if attempt < max_attempts - 1:
                        delay = base_delay * (2 ** attempt)
                        print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay}s...")
                        time.sleep(delay)
            raise last_error
        return wrapper
    return decorator


def with_fallback(primary_fn: Callable, fallback_fn: Callable) -> Callable:
    """Call primary function; fall back to secondary if it fails."""
    def wrapper(*args, **kwargs):
        try:
            return primary_fn(*args, **kwargs)
        except Exception as e:
            print(f"Primary failed ({e}), using fallback")
            return fallback_fn(*args, **kwargs)
    return wrapper


class AgentCircuitBreaker:
    """
    Prevent cascading failures in agent loops.
    Opens (stops calling) if too many failures occur in a window.
    """

    def __init__(self, failure_threshold: int = 5, recovery_timeout: float = 60):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failures = 0
        self.last_failure_time = 0
        self.state = "closed"  # closed = operational, open = failing

    def call(self, fn: Callable, *args, **kwargs):
        if self.state == "open":
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = "half-open"
            else:
                raise RuntimeError("Circuit breaker open — too many recent failures")

        try:
            result = fn(*args, **kwargs)
            if self.state == "half-open":
                self.state = "closed"
                self.failures = 0
            return result
        except Exception as e:
            self.failures += 1
            self.last_failure_time = time.time()
            if self.failures >= self.failure_threshold:
                self.state = "open"
            raise

Production Agent Architecture

User Request
    │
    ā–¼
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│           API Gateway              │
│  (auth, rate limiting, logging)    │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
                   │
    ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
    │       Agent Orchestrator     │
    │  - Input validation          │
    │  - Memory loading            │
    │  - Supervisor routing        │
    │  - Step budget (max_steps)   │
    │  - Timeout handling          │
    ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
                   │
        ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
        │                     │
   ā”Œā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”          ā”Œā”€ā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”
   │ Tool A  │          │  Tool B  │
   │(DB call)│          │(API call)│
   ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜          ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
                   │
    ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
    │       Output Pipeline        │
    │  - Response validation       │
    │  - Safety check              │
    │  - Citation verification     │
    │  - Logging                   │
    └────────────────────────────── ā”˜
                   │
    ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
    │         User Response        │
    ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

Key production requirements:

  • Step budgets: Hard limit on agent iterations to prevent infinite loops
  • Timeout handling: Each tool call and overall agent run has a timeout
  • State persistence: Checkpoint agent state so long-running tasks survive restarts
  • Observability: Log every LLM call, tool invocation, and result for debugging
  • Cost tracking: Monitor token usage per agent run; alert on outliers

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:š•

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.