AI Agents and Tool Calling Workflows: Production Patterns
Design reliable AI agents with tool calling, planning loops, memory boundaries, retries, and human-in-the-loop safeguards.
Agent systems fail when they are treated as magic. Reliable agents are just deterministic workflows wrapped around probabilistic reasoning.
Agent Architecture That Scales
User Task -> Planner -> Tool Selector -> Tool Executor -> Verifier -> Finalizer
^ |
|----- Retry -----|Design each step as an explicit state transition.
1) Tool Contracts First
Every tool should have:
- strict input schema
- bounded side effects
- idempotency where possible
- machine-readable error codes
Example tool schema:
{
"name": "create_ticket",
"input": { "title": "string", "priority": "low|med|high" },
"output": { "ticket_id": "string", "status": "created|failed" }
}2) Planning and Execution Loop
Keep planning shallow and bounded:
- max steps (e.g., 8)
- max tool calls per step
- explicit stop criteria
MAX_STEPS = 8
for step in range(MAX_STEPS):
plan = planner(state)
action = choose_action(plan)
result = run_tool(action)
state = update_state(state, action, result)
if state.done:
break3) Reliability Patterns
- Retry transient errors (timeouts, rate limits)
- Never retry unsafe writes blindly
- Verify tool outputs before next step
- Fallback to deterministic path for critical flows
Use circuit breakers for flaky external tools.
4) Memory Strategy
Split memory into:
- session memory (ephemeral conversation state)
- task memory (current objective + intermediate artifacts)
- long-term memory (approved facts only)
Do not mix raw user chat history into long-term memory without curation.
5) Human-in-the-Loop Controls
Require approval for:
- financial actions
- data deletion
- external communication
- permission changes
Pattern:
Agent proposes action -> show summary/risk -> human approve/reject -> execute6) Agent Security Guardrails
- Prompt injection filtering on retrieved content
- Tool allowlist per task type
- Context isolation between tenants/projects
- Output policy checks before final response
Assume hostile input by default.
7) Metrics to Track in Production
- task success rate
- average steps per task
- tool error rate by tool
- human override rate
- latency and cost per completed task
If success rate drops while step count rises, your planner is drifting.
Minimal FastAPI Orchestrator Pattern
from fastapi import FastAPI
app = FastAPI()
@app.post("/agent/run")
async def run_agent(task: dict):
# validate task
# run bounded planner/executor loop
# enforce approvals for sensitive actions
# return audit trail + result
return {"status": "ok", "steps": []}Production Readiness Checklist
- Tool contracts versioned
- Unsafe tools gated behind approvals
- Replay logs available for failed runs
- Regression tasks run on every release
- Cost budget enforced per workflow
Ship agents as workflows, not personalities.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.