Building Agents with LLMs
How to build reliable LLM agents: the ReAct pattern, tool loops, memory systems, multi-agent orchestration, and production agent architecture patterns.
What Makes an Agent
An LLM agent is any system where an LLM takes actions in a loop, where the results of actions feed back into subsequent LLM calls. The key elements:
- Perception: The LLM receives state information (conversation, tool results, memory)
- Reasoning: The LLM decides what to do next (chain-of-thought, ReAct, etc.)
- Action: The LLM calls a tool, sends a message, or produces output
- Feedback: Tool results or environment state updates the LLM's context
The loop runs until a stopping condition: the LLM produces a final answer, reaches a maximum step count, or an error occurs.
The ReAct Pattern
ReAct (Reason + Act) is the foundational agent pattern:
from openai import OpenAI
import json
import re
client = OpenAI()
REACT_SYSTEM = """You are a clinical pharmacist assistant. Solve problems step by step.
For each step, use this format:
Thought: [reasoning about what to do next]
Action: [tool_name({"param": "value"})]
Or when you have the final answer:
Thought: [final reasoning]
Final Answer: [your answer]
Available tools:
- lookup_interaction(drug_a, drug_b): Check drug interaction severity and management
- get_dosing(drug, egfr): Get renal dose adjustment for a drug
- search_guidelines(query): Search clinical guidelines database"""
def run_react_agent(question: str, tools: dict, max_steps: int = 10) -> str:
"""Execute a ReAct loop to answer a clinical question."""
messages = [
{"role": "system", "content": REACT_SYSTEM},
{"role": "user", "content": question},
]
for step in range(max_steps):
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
temperature=0,
stop=["Observation:"], # Stop before generating fake observations
)
model_output = response.choices[0].message.content
messages.append({"role": "assistant", "content": model_output})
print(f"\nStep {step + 1}:\n{model_output}")
# Check for final answer
if "Final Answer:" in model_output:
final = re.search(r"Final Answer:\s*(.+?)$", model_output, re.DOTALL)
return final.group(1).strip() if final else model_output
# Parse action
action_match = re.search(r"Action:\s*(\w+)\(({[^}]+})\)", model_output)
if not action_match:
# No valid action ā ask model to try again
messages.append({
"role": "user",
"content": "Please use the Action: format to call a tool or provide a Final Answer:",
})
continue
tool_name = action_match.group(1)
tool_args = json.loads(action_match.group(2))
# Execute tool
if tool_name in tools:
try:
result = tools[tool_name](**tool_args)
observation = f"Observation: {json.dumps(result)}"
except Exception as e:
observation = f"Observation: Error - {str(e)}"
else:
observation = f"Observation: Tool '{tool_name}' not found"
print(observation)
messages.append({"role": "user", "content": observation})
return "Max steps reached without a final answer."Memory Systems for Long-Running Agents
Agents handling ongoing tasks need persistent memory:
from dataclasses import dataclass, field
from typing import Any
import json
from datetime import datetime
@dataclass
class AgentMemory:
"""
Multi-level memory system for a persistent agent.
"""
# Working memory: current task context (fits in context window)
working_memory: list[dict] = field(default_factory=list)
# Episodic memory: important events from past sessions
episodic_memory: list[dict] = field(default_factory=list)
# Semantic memory: learned facts (external store)
semantic_store: dict[str, Any] = field(default_factory=dict)
def add_to_working(self, content: str, role: str = "system") -> None:
self.working_memory.append({
"role": role,
"content": content,
"timestamp": datetime.now().isoformat(),
})
# Trim working memory if it gets too long
if len(self.working_memory) > 50:
self._compress_working_memory()
def _compress_working_memory(self) -> None:
"""Summarize old messages to free context window space."""
old_messages = self.working_memory[:-20] # Keep recent 20
old_text = "\n".join([f"{m['role']}: {m['content']}" for m in old_messages])
summary = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": f"Summarize these agent conversation turns in 3-5 bullet points, preserving important decisions and facts:\n\n{old_text}",
}],
).choices[0].message.content
self.episodic_memory.append({
"summary": summary,
"timestamp": datetime.now().isoformat(),
})
self.working_memory = self.working_memory[-20:]
def learn(self, key: str, value: Any) -> None:
"""Store a fact in semantic memory."""
self.semantic_store[key] = value
def recall(self, key: str) -> Any:
return self.semantic_store.get(key)
def get_context_for_prompt(self) -> str:
"""Build memory context for including in LLM prompt."""
parts = []
if self.episodic_memory:
parts.append("PREVIOUS SESSION SUMMARY:")
parts.append(self.episodic_memory[-1]["summary"])
if self.semantic_store:
parts.append("\nLEARNED FACTS:")
for k, v in list(self.semantic_store.items())[-5:]: # Last 5 facts
parts.append(f"- {k}: {v}")
return "\n".join(parts)Multi-Agent Patterns
Supervisor Pattern
A supervisor agent delegates to specialized sub-agents:
from openai import OpenAI
client = OpenAI()
SUPERVISOR_PROMPT = """You are a clinical AI supervisor. Route incoming questions to the appropriate specialist.
Available specialists:
- drug_interaction_specialist: Questions about drug-drug interactions
- dosing_specialist: Questions about dose adjustments (renal, hepatic)
- patient_counseling_specialist: Questions about patient education and counseling
- drug_information_specialist: General drug information, mechanism, pharmacokinetics
For each question, respond with:
ROUTE_TO: <specialist_name>
REASON: <why this specialist>"""
SPECIALIST_PROMPTS = {
"drug_interaction_specialist": "You are an expert in drug-drug interactions. Use Lexicomp/Micromedex classification.",
"dosing_specialist": "You are a pharmacist specializing in renal and hepatic dose adjustments.",
"patient_counseling_specialist": "You are a patient education specialist. Use plain language (6th grade reading level).",
"drug_information_specialist": "You are a drug information pharmacist. Provide comprehensive, evidence-based information.",
}
def supervisor_agent(question: str) -> dict:
"""Route question to appropriate specialist."""
# Step 1: Supervisor decides routing
routing_response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SUPERVISOR_PROMPT},
{"role": "user", "content": question},
],
temperature=0,
).choices[0].message.content
# Parse routing decision
import re
route_match = re.search(r"ROUTE_TO:\s*(\w+)", routing_response)
specialist = route_match.group(1) if route_match else "drug_information_specialist"
# Step 2: Specialist answers
if specialist not in SPECIALIST_PROMPTS:
specialist = "drug_information_specialist"
specialist_response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SPECIALIST_PROMPTS[specialist]},
{"role": "user", "content": question},
],
temperature=0,
).choices[0].message.content
return {
"specialist": specialist,
"routing_reason": routing_response,
"answer": specialist_response,
}Critic Pattern
One agent generates, another critiques and improves:
def generate_with_critic(
task: str,
n_revisions: int = 2,
) -> str:
"""Generate a response and improve it through self-critique."""
GENERATOR_PROMPT = "You are a clinical pharmacist. Provide accurate, helpful answers."
CRITIC_PROMPT = """You are a clinical pharmacology reviewer.
Identify issues with this response:
1. Clinical accuracy errors
2. Missing important information
3. Safety concerns
4. Unclear or poorly structured content
Be specific about what to improve."""
# Initial generation
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": GENERATOR_PROMPT},
{"role": "user", "content": task},
],
temperature=0.3,
).choices[0].message.content
for revision in range(n_revisions):
# Critique
critique = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": CRITIC_PROMPT},
{"role": "user", "content": f"Original question: {task}\n\nResponse to review:\n{response}"},
],
temperature=0,
).choices[0].message.content
# Revise
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": GENERATOR_PROMPT},
{"role": "user", "content": task},
{"role": "assistant", "content": response},
{"role": "user", "content": f"A reviewer identified these issues:\n{critique}\n\nPlease revise to address them."},
],
temperature=0,
).choices[0].message.content
return responseReliability: Handling Failures
import time
from functools import wraps
from typing import Callable, TypeVar
T = TypeVar("T")
def with_retry(max_attempts: int = 3, base_delay: float = 1.0):
"""Decorator for retrying LLM calls with exponential backoff."""
def decorator(fn: Callable[..., T]) -> Callable[..., T]:
@wraps(fn)
def wrapper(*args, **kwargs) -> T:
last_error = None
for attempt in range(max_attempts):
try:
return fn(*args, **kwargs)
except Exception as e:
last_error = e
if attempt < max_attempts - 1:
delay = base_delay * (2 ** attempt)
print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay}s...")
time.sleep(delay)
raise last_error
return wrapper
return decorator
def with_fallback(primary_fn: Callable, fallback_fn: Callable) -> Callable:
"""Call primary function; fall back to secondary if it fails."""
def wrapper(*args, **kwargs):
try:
return primary_fn(*args, **kwargs)
except Exception as e:
print(f"Primary failed ({e}), using fallback")
return fallback_fn(*args, **kwargs)
return wrapper
class AgentCircuitBreaker:
"""
Prevent cascading failures in agent loops.
Opens (stops calling) if too many failures occur in a window.
"""
def __init__(self, failure_threshold: int = 5, recovery_timeout: float = 60):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.failures = 0
self.last_failure_time = 0
self.state = "closed" # closed = operational, open = failing
def call(self, fn: Callable, *args, **kwargs):
if self.state == "open":
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = "half-open"
else:
raise RuntimeError("Circuit breaker open ā too many recent failures")
try:
result = fn(*args, **kwargs)
if self.state == "half-open":
self.state = "closed"
self.failures = 0
return result
except Exception as e:
self.failures += 1
self.last_failure_time = time.time()
if self.failures >= self.failure_threshold:
self.state = "open"
raiseProduction Agent Architecture
User Request
ā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā API Gateway ā
ā (auth, rate limiting, logging) ā
āāāāāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāā
ā
āāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā
ā Agent Orchestrator ā
ā - Input validation ā
ā - Memory loading ā
ā - Supervisor routing ā
ā - Step budget (max_steps) ā
ā - Timeout handling ā
āāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāā
ā
āāāāāāāāāāāā“āāāāāāāāāāā
ā ā
āāāāāā¼āāāāā āāāāāāā¼āāāāā
ā Tool A ā ā Tool B ā
ā(DB call)ā ā(API call)ā
āāāāāāāāāāā āāāāāāāāāāāā
ā
āāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā
ā Output Pipeline ā
ā - Response validation ā
ā - Safety check ā
ā - Citation verification ā
ā - Logging ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā
āāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā
ā User Response ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāKey production requirements:
- Step budgets: Hard limit on agent iterations to prevent infinite loops
- Timeout handling: Each tool call and overall agent run has a timeout
- State persistence: Checkpoint agent state so long-running tasks survive restarts
- Observability: Log every LLM call, tool invocation, and result for debugging
- Cost tracking: Monitor token usage per agent run; alert on outliers
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.