Agentic AI Patterns · Lesson 14 of 15
Max Iterations, Timeouts, and Circuit Breakers
Why Stopping Conditions Matter
An agent without a stopping condition is a resource leak. In production:
- An infinite loop burns tokens at $0.01-$0.10 per 1,000 tokens
- A stuck agent holds a connection and blocks other requests
- After the context window fills (128k tokens), the next call will fail or truncate history
Every agent loop needs at least one hard stopping condition and ideally two or three layered stops.
The Four Stopping Mechanisms
| Type | Trigger | Best For |
|---|---|---|
| Hard stop | iterations >= max_iterations | All agents (always include) |
| Soft stop | Agent outputs a final answer token | Task completion detection |
| Budget stop | tokens_spent >= max_tokens | Cost-sensitive production |
| Timeout | elapsed >= max_seconds | User-facing latency SLAs |
1. Hard Stop: Max Iterations
The simplest and most important. Always include this.
# agents/base_agent.py
import asyncio
from openai import AsyncAzureOpenAI
import structlog
log = structlog.get_logger()
class AgentLoop:
def __init__(
self,
client: AsyncAzureOpenAI,
max_iterations: int = 10,
):
self.client = client
self.max_iterations = max_iterations
async def run(self, goal: str) -> str:
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": goal},
]
for iteration in range(self.max_iterations):
log.info("agent_iteration", iteration=iteration, goal=goal[:50])
response = await self.client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=TOOL_DEFINITIONS,
tool_choice="auto",
)
msg = response.choices[0].message
# Check for final answer (soft stop)
if msg.tool_calls is None:
log.info("agent_complete", iterations=iteration + 1)
return msg.content
# Execute tool calls and add results
messages.append(msg)
for tool_call in msg.tool_calls:
result = await self.execute_tool(tool_call)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result),
})
# Hard stop reached
log.warning("agent_max_iterations_reached", max=self.max_iterations, goal=goal[:50])
return await self.generate_best_effort_answer(messages)
async def generate_best_effort_answer(self, messages: list) -> str:
"""Generate a partial answer when the agent runs out of iterations."""
messages.append({
"role": "user",
"content": (
"You have reached the maximum number of steps. "
"Based on what you have gathered so far, provide the best answer you can. "
"Clearly state if your answer is incomplete."
),
})
response = await self.client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
return response.choices[0].message.content2. Soft Stop: Final Answer Detection
The agent signals completion by outputting a structured final answer instead of another tool call.
Approach A — structured output: when the model returns no tool calls, treat the text response as the final answer (OpenAI's standard tool-calling protocol).
Approach B — explicit token: require the agent to output FINAL ANSWER: before the answer:
SYSTEM_PROMPT = """You are a research assistant.
Think step by step. Use tools to gather information.
When you have enough information, output:
FINAL ANSWER: [your complete answer here]
Do NOT output FINAL ANSWER until you are confident in your answer."""
def is_final_answer(content: str) -> bool:
return content.strip().startswith("FINAL ANSWER:")
def extract_answer(content: str) -> str:
return content.split("FINAL ANSWER:", 1)[1].strip()3. Budget Stop: Token Limit
Track cumulative token usage and stop before hitting the context window limit:
class BudgetAwareAgentLoop:
def __init__(
self,
client: AsyncAzureOpenAI,
max_iterations: int = 10,
max_tokens: int = 50_000, # stop at 50k total tokens
):
self.client = client
self.max_iterations = max_iterations
self.max_tokens = max_tokens
self.tokens_used = 0
async def run(self, goal: str) -> str:
messages = [...]
for iteration in range(self.max_iterations):
response = await self.client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=TOOL_DEFINITIONS,
)
# Track token usage
self.tokens_used += response.usage.total_tokens
log.info(
"agent_step",
iteration=iteration,
tokens_this_step=response.usage.total_tokens,
tokens_total=self.tokens_used,
)
# Budget stop check
if self.tokens_used >= self.max_tokens:
log.warning(
"agent_budget_exceeded",
tokens_used=self.tokens_used,
max_tokens=self.max_tokens,
)
return await self.generate_best_effort_answer(messages)
msg = response.choices[0].message
if msg.tool_calls is None:
return msg.content
# ... execute tools4. Timeout: Wall Clock Limit
For user-facing agents where latency matters:
import asyncio
class TimedAgentLoop(AgentLoop):
def __init__(self, *args, max_seconds: float = 30.0, **kwargs):
super().__init__(*args, **kwargs)
self.max_seconds = max_seconds
async def run(self, goal: str) -> str:
try:
return await asyncio.wait_for(
super().run(goal),
timeout=self.max_seconds,
)
except asyncio.TimeoutError:
log.warning("agent_timeout", max_seconds=self.max_seconds, goal=goal[:50])
return (
"I was unable to complete this query within the time limit. "
"Please try again or simplify your question."
)Combining All Four
In production, combine all four mechanisms. The agent stops on whichever triggers first:
class ProductionAgentLoop:
def __init__(
self,
client: AsyncAzureOpenAI,
max_iterations: int = 10,
max_tokens: int = 50_000,
max_seconds: float = 30.0,
):
self.inner = BudgetAwareAgentLoop(client, max_iterations, max_tokens)
self.max_seconds = max_seconds
async def run(self, goal: str) -> str:
"""Stops on: final answer | max iterations | budget | timeout."""
try:
return await asyncio.wait_for(
self.inner.run(goal),
timeout=self.max_seconds,
)
except asyncio.TimeoutError:
log.warning("agent_timeout", goal=goal[:50])
return self._timeout_response()
def _timeout_response(self) -> str:
return (
"This query took too long to process. "
"Please try a more specific question."
)Recommended Defaults
For a production pharmaceutical chatbot:
AGENT_CONFIG = {
"max_iterations": 8, # Typical good answers take 2-4 steps
"max_tokens": 40_000, # Well under GPT-4o's 128k context limit
"max_seconds": 25.0, # 5s buffer before a 30s client timeout
}Monitor these metrics to tune the values:
agent_iterations_mean: should be under 4 for well-scoped tasksagent_max_iterations_hit_rate: should be under 2%agent_timeout_rate: should be under 1%agent_budget_exceeded_rate: should be under 0.5%