Learnixo

Agentic AI Patterns · Lesson 14 of 15

Max Iterations, Timeouts, and Circuit Breakers

Why Stopping Conditions Matter

An agent without a stopping condition is a resource leak. In production:

  • An infinite loop burns tokens at $0.01-$0.10 per 1,000 tokens
  • A stuck agent holds a connection and blocks other requests
  • After the context window fills (128k tokens), the next call will fail or truncate history

Every agent loop needs at least one hard stopping condition and ideally two or three layered stops.


The Four Stopping Mechanisms

| Type | Trigger | Best For | |---|---|---| | Hard stop | iterations >= max_iterations | All agents (always include) | | Soft stop | Agent outputs a final answer token | Task completion detection | | Budget stop | tokens_spent >= max_tokens | Cost-sensitive production | | Timeout | elapsed >= max_seconds | User-facing latency SLAs |


1. Hard Stop: Max Iterations

The simplest and most important. Always include this.

Python
# agents/base_agent.py
import asyncio
from openai import AsyncAzureOpenAI
import structlog

log = structlog.get_logger()

class AgentLoop:
    def __init__(
        self,
        client: AsyncAzureOpenAI,
        max_iterations: int = 10,
    ):
        self.client = client
        self.max_iterations = max_iterations

    async def run(self, goal: str) -> str:
        messages = [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": goal},
        ]

        for iteration in range(self.max_iterations):
            log.info("agent_iteration", iteration=iteration, goal=goal[:50])

            response = await self.client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                tools=TOOL_DEFINITIONS,
                tool_choice="auto",
            )

            msg = response.choices[0].message

            # Check for final answer (soft stop)
            if msg.tool_calls is None:
                log.info("agent_complete", iterations=iteration + 1)
                return msg.content

            # Execute tool calls and add results
            messages.append(msg)
            for tool_call in msg.tool_calls:
                result = await self.execute_tool(tool_call)
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": str(result),
                })

        # Hard stop reached
        log.warning("agent_max_iterations_reached", max=self.max_iterations, goal=goal[:50])
        return await self.generate_best_effort_answer(messages)

    async def generate_best_effort_answer(self, messages: list) -> str:
        """Generate a partial answer when the agent runs out of iterations."""
        messages.append({
            "role": "user",
            "content": (
                "You have reached the maximum number of steps. "
                "Based on what you have gathered so far, provide the best answer you can. "
                "Clearly state if your answer is incomplete."
            ),
        })
        response = await self.client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
        )
        return response.choices[0].message.content

2. Soft Stop: Final Answer Detection

The agent signals completion by outputting a structured final answer instead of another tool call.

Approach A — structured output: when the model returns no tool calls, treat the text response as the final answer (OpenAI's standard tool-calling protocol).

Approach B — explicit token: require the agent to output FINAL ANSWER: before the answer:

Python
SYSTEM_PROMPT = """You are a research assistant.

Think step by step. Use tools to gather information.
When you have enough information, output:

FINAL ANSWER: [your complete answer here]

Do NOT output FINAL ANSWER until you are confident in your answer."""

def is_final_answer(content: str) -> bool:
    return content.strip().startswith("FINAL ANSWER:")

def extract_answer(content: str) -> str:
    return content.split("FINAL ANSWER:", 1)[1].strip()

3. Budget Stop: Token Limit

Track cumulative token usage and stop before hitting the context window limit:

Python
class BudgetAwareAgentLoop:
    def __init__(
        self,
        client: AsyncAzureOpenAI,
        max_iterations: int = 10,
        max_tokens: int = 50_000,  # stop at 50k total tokens
    ):
        self.client = client
        self.max_iterations = max_iterations
        self.max_tokens = max_tokens
        self.tokens_used = 0

    async def run(self, goal: str) -> str:
        messages = [...]

        for iteration in range(self.max_iterations):
            response = await self.client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                tools=TOOL_DEFINITIONS,
            )

            # Track token usage
            self.tokens_used += response.usage.total_tokens

            log.info(
                "agent_step",
                iteration=iteration,
                tokens_this_step=response.usage.total_tokens,
                tokens_total=self.tokens_used,
            )

            # Budget stop check
            if self.tokens_used >= self.max_tokens:
                log.warning(
                    "agent_budget_exceeded",
                    tokens_used=self.tokens_used,
                    max_tokens=self.max_tokens,
                )
                return await self.generate_best_effort_answer(messages)

            msg = response.choices[0].message
            if msg.tool_calls is None:
                return msg.content

            # ... execute tools

4. Timeout: Wall Clock Limit

For user-facing agents where latency matters:

Python
import asyncio

class TimedAgentLoop(AgentLoop):
    def __init__(self, *args, max_seconds: float = 30.0, **kwargs):
        super().__init__(*args, **kwargs)
        self.max_seconds = max_seconds

    async def run(self, goal: str) -> str:
        try:
            return await asyncio.wait_for(
                super().run(goal),
                timeout=self.max_seconds,
            )
        except asyncio.TimeoutError:
            log.warning("agent_timeout", max_seconds=self.max_seconds, goal=goal[:50])
            return (
                "I was unable to complete this query within the time limit. "
                "Please try again or simplify your question."
            )

Combining All Four

In production, combine all four mechanisms. The agent stops on whichever triggers first:

Python
class ProductionAgentLoop:
    def __init__(
        self,
        client: AsyncAzureOpenAI,
        max_iterations: int = 10,
        max_tokens: int = 50_000,
        max_seconds: float = 30.0,
    ):
        self.inner = BudgetAwareAgentLoop(client, max_iterations, max_tokens)
        self.max_seconds = max_seconds

    async def run(self, goal: str) -> str:
        """Stops on: final answer | max iterations | budget | timeout."""
        try:
            return await asyncio.wait_for(
                self.inner.run(goal),
                timeout=self.max_seconds,
            )
        except asyncio.TimeoutError:
            log.warning("agent_timeout", goal=goal[:50])
            return self._timeout_response()

    def _timeout_response(self) -> str:
        return (
            "This query took too long to process. "
            "Please try a more specific question."
        )

Recommended Defaults

For a production pharmaceutical chatbot:

Python
AGENT_CONFIG = {
    "max_iterations": 8,    # Typical good answers take 2-4 steps
    "max_tokens": 40_000,   # Well under GPT-4o's 128k context limit
    "max_seconds": 25.0,    # 5s buffer before a 30s client timeout
}

Monitor these metrics to tune the values:

  • agent_iterations_mean: should be under 4 for well-scoped tasks
  • agent_max_iterations_hit_rate: should be under 2%
  • agent_timeout_rate: should be under 1%
  • agent_budget_exceeded_rate: should be under 0.5%