AI Agents and Tool Calling Workflows: Production Patterns

Agent systems fail when they are treated as magic. Reliable agents are just deterministic workflows wrapped around probabilistic reasoning.

Agent Architecture That Scales

TEXT

User Task -> Planner -> Tool Selector -> Tool Executor -> Verifier -> Finalizer
                         ^                 |
                         |----- Retry -----|

Design each step as an explicit state transition.

1) Tool Contracts First

Every tool should have:

strict input schema
bounded side effects
idempotency where possible
machine-readable error codes

Example tool schema:

JSON

{
  "name": "create_ticket",
  "input": { "title": "string", "priority": "low|med|high" },
  "output": { "ticket_id": "string", "status": "created|failed" }
}

2) Planning and Execution Loop

Keep planning shallow and bounded:

max steps (e.g., 8)
max tool calls per step
explicit stop criteria

Python

MAX_STEPS = 8
for step in range(MAX_STEPS):
    plan = planner(state)
    action = choose_action(plan)
    result = run_tool(action)
    state = update_state(state, action, result)
    if state.done:
        break

3) Reliability Patterns

Retry transient errors (timeouts, rate limits)
Never retry unsafe writes blindly
Verify tool outputs before next step
Fallback to deterministic path for critical flows

Use circuit breakers for flaky external tools.

4) Memory Strategy

Split memory into:

session memory (ephemeral conversation state)
task memory (current objective + intermediate artifacts)
long-term memory (approved facts only)

Do not mix raw user chat history into long-term memory without curation.

5) Human-in-the-Loop Controls

Require approval for:

financial actions
data deletion
external communication
permission changes

Pattern:

TEXT

Agent proposes action -> show summary/risk -> human approve/reject -> execute

6) Agent Security Guardrails

Prompt injection filtering on retrieved content
Tool allowlist per task type
Context isolation between tenants/projects
Output policy checks before final response

Assume hostile input by default.

7) Metrics to Track in Production

task success rate
average steps per task
tool error rate by tool
human override rate
latency and cost per completed task

If success rate drops while step count rises, your planner is drifting.

Minimal FastAPI Orchestrator Pattern