Agentic Systems in Production: Patterns That Survive Real Traffic

The Problem Single-pass LLM calls don't survive contact with production. The moment you give a model tools that mutate state — booking flights, processing...

AI agents production architecture

The Problem

Single-pass LLM calls don't survive contact with production. The moment you give a model tools that mutate state — booking flights, processing refunds, opening pull requests, rerouting shipments — every property you took for granted in a stateless API breaks: retries are no longer idempotent, latency is unbounded, the action space is non-deterministic, and the failure mode is now "wrong action executed" rather than "wrong text returned" Source 2 Source 16. Most production agent failures aren't model failures; they're orchestration, identity, and observability failures dressed up as model failures Source 17.

The Shape

The pattern that holds up: a deterministic orchestrator wrapping a non-deterministic reasoner, with idempotent tools, hard budget caps, and a human-in-the-loop gate on irreversible actions Source 5 Source 21. Copy-paste skeleton:

import asyncio, time, uuid, logging
from dataclasses import dataclass, field

log = logging.getLogger("agent")

@dataclass
class RunBudget:
    max_steps: int = 12
    max_tokens: int = 100_000
    max_usd: float = 2.00
    deadline_s: float = 90.0
    tokens_used: int = 0
    usd_used: float = 0.0
    steps: int = 0
    started: float = field(default_factory=time.monotonic)

    def check(self):
        if self.steps >= self.max_steps: raise BudgetExceeded("steps")
        if self.tokens_used >= self.max_tokens: raise BudgetExceeded("tokens")
        if self.usd_used >= self.max_usd: raise BudgetExceeded("usd")
        if time.monotonic() - self.started > self.deadline_s: raise BudgetExceeded("deadline")

class BudgetExceeded(Exception): pass
class CircuitOpen(Exception): pass

TOOL_ALLOWLIST = {"search_kb", "get_order", "draft_refund"}
HITL_REQUIRED  = {"issue_refund", "send_email", "create_ticket"}

class CircuitBreaker:
    def __init__(self, threshold=5, cooldown=30):
        self.fail = 0; self.threshold = threshold
        self.opened_at = 0; self.cooldown = cooldown
    def allow(self):
        if self.fail < self.threshold: return True
        if time.monotonic() - self.opened_at > self.cooldown:
            self.fail = self.threshold - 1
            return True
        return False
    def record(self, ok):
        if ok: self.fail = 0
        else:
            self.fail += 1
            if self.fail == self.threshold: self.opened_at = time.monotonic()

BREAKERS = {}

async def call_tool(name, args, idempotency_key, breaker):
    if name not in TOOL_ALLOWLIST:
        return {"error": f"tool '{name}' not allowlisted"}
    if not breaker.allow():
        raise CircuitOpen(name)
    for attempt in range(3):
        try:
            res = await asyncio.wait_for(
                TOOLS[name](args, idempotency_key=idempotency_key),
                timeout=5.0,
            )
            breaker.record(True)
            return res
        except (asyncio.TimeoutError, TransientError):
            await asyncio.sleep((2 ** attempt) + (attempt * 0.1))
    breaker.record(False)
    return {"error": "tool failed after retries"}

async def hitl_gate(action, args, run_id):
    approval = await approvals.request(
        run_id=run_id, action=action, args=args, ttl_s=600
    )
    return approval.decision == "approve"

async def run_agent(user_msg, principal, budget=None):
    budget = budget or RunBudget()
    run_id = str(uuid.uuid4())
    trace = []
    state = {"messages": [{"role": "user", "content": user_msg}]}

    while True:
        budget.check(); budget.steps += 1

        step = await llm.plan(
            state, tools=list(TOOL_ALLOWLIST | HITL_REQUIRED),
            principal=principal,
        )
        budget.tokens_used += step.usage.total_tokens
        budget.usd_used   += step.usage.cost_usd
        trace.append({"run": run_id, "step": budget.steps, "thought": step.thought,
                      "action": step.action, "args": step.args})

        if step.action == "final":
            log.info("agent.done", extra={"run": run_id, "steps": budget.steps})
            return step.answer, trace

        breaker = BREAKERS.setdefault(step.action, CircuitBreaker())
        idem_key = f"{run_id}:{budget.steps}:{step.action}"

        if step.action in HITL_REQUIRED:
            if not await hitl_gate(step.action, step.args, run_id):
                state["messages"].append(
                    {"role": "tool", "name": step.action, "content": "denied_by_human"}
                )
                continue

        try:
            result = await call_tool(step.action, step.args, idem_key, breaker)
        except (BudgetExceeded, CircuitOpen) as e:
            state["messages"].append(
                {"role": "tool", "name": step.action, "content": f"halt:{e}"}
            )
            return await llm.summarize_halt(state, reason=str(e)), trace

        state["messages"].append(
            {"role": "tool", "name": step.action, "content": result}
        )

Every step is traced, every tool call is keyed for idempotent retry, every action that mutates the world either fails closed or requires human approval, and the loop cannot exceed its step, token, USD, or wall-clock budget Source 5 Source 8 Source 26.

How It Works

The agent loop itself is the ReAct pattern — observe, reason, act, repeat — wrapped around a model whose action space is constrained to a tool allowlist, with each tool described by a JSON schema the model uses for routing and parameter generation Source 13 Source 23. The orchestrator, not the model, owns control flow: it counts steps, charges the budget, fans out to tools, and decides when to hand off to a human. "Separating the brain from the hands" — the model classifies and extracts, deterministic code applies the patch — is what keeps a hallucinated argument from becoming a hallucinated refund Source 15.

Idempotency is the load-bearing property. Tool calls to external APIs fail transiently; retry with exponential backoff is mandatory, but only safe when the tool checks for an existing record with the same idempotency key before creating a new one Source 5 Source 8. The circuit breaker — closed, open, half-open — is the same Hystrix pattern Netflix taught the industry; in an agent context it stops a degraded downstream from burning the entire token budget on doomed retries Source 19 Source 7. Bulkhead the breakers per-tool so a flaky email API doesn't poison the search path.

Identity and authorization are the part most demos skip. Agentic context is autonomous, dynamic, multi-system; the user's identity must propagate through the orchestrator, sub-agents, and MCP servers to whatever resource finally executes the write, or you create a confused-deputy problem at scale Source 2 Source 33. Each agent should have a unique identity, least-privilege scoped to its task, with just-in-time provisioning for sensitive credentials and a narrow tool catalog so a compromised sub-agent has nowhere to pivot Source 12 Source 12 Source 16. Prompt injection through retrieved content is real — five poisoned documents can flip behavior with 90% success in published research — so the orchestration layer must validate tool args, not trust the model's claim about them Source 16.

The observability layer is non-negotiable. Catchpoint's framing — "what the AI decided / what it executed / where it broke" — is the right schema for traces, because page-load and API-latency dashboards don't tell you whether intent was actually fulfilled Source 17 Source 17. Distributed trace IDs link the LLM call to every tool invocation; cost-per-task and steps-per-task are the leading indicators of orchestration regressions long before user-facing errors appear Source 8.

  user ──▶ orchestrator ──▶ planner(LLM)
              │                  │ thought + action
              │ budget/step ◀────┘
              │
              ├──▶ allowlist check ──▶ HITL gate (if mutating)
              │                              │ approve/deny
              ├──▶ circuit breaker ──▶ tool (idempotent, timeout, retry)
              │                              │ result
              │ trace + cost ◀───────────────┘
              ▼
           audit log / observability

When It Breaks

Condition	What happens	Use instead
Single mega-tool wraps a 40-parameter API Source 14	Model hallucinates IDs, timestamps, unique keys; tool calls fail or mutate wrong record	Split into field-group tools with `enum`-constrained targets; resolve IDs server-side from natural language Source 15 Source 26
Free-built orchestration component dropped in without integration to identity model Source 1 Source 1	Point-to-point silo; no consistent governance, no central trace; auditability gaps	Hybrid: reuse the component but route through your orchestration layer that owns prompts, routing, evals Source 1 Source 20
Synchronous request-response across multi-agent handoff at live-event traffic	Thundering-herd cache expirations, retry storms, p99 collapse Source 10 Source 11	Async message bus with jittered TTLs, dead-letter queue, back-pressure, traffic prioritization for critical paths Source 7 Source 19
Agent given write access without HITL on irreversible actions Source 9 Source 21	"Acceleration in the wrong direction" — refunds issued, emails sent, prod data touched at machine speed	Classify actions ALLOW / ALLOW_WITH_CAPS / DENY; require approval gates on high-impact and irreversible writes Source 26 Source 32
LLM used as a decision agent for regulated outcomes (lending, claims) Source 4 Source 29	Inconsistent decisions, black-box reasoning, no audit trail that satisfies the regulator	Decision agent built on business rules / DMN for the deterministic call; LLM stays at the chat/extraction layer Source 29 Source 30
Single agent attempts the whole workflow end-to-end Source 6 Source 18	High token waste, error propagation across steps, agent stuck in loops	Multi-agent with supervisor + specialized workers; A2A handoff; or fine-tune for domain-aligned tool use Source 3 Source 28
Budget caps absent; model picks expensive frontier tier for every step Source 22 Source 24	Cost-per-task drifts up week-over-week; spend tied to model choice, not task complexity	Tiered routing: small model for plan-execution, frontier for the plan itself; enforce per-run USD ceiling Source 22
Context window grows unbounded across multi-turn agent run Source 3 Source 25	Latency cliff, GC-style pauses, cost explosion, model loses task focus	Sliding window + summarization buffer; vector store for episodic memory retrieval Source 3
"Conductor" mental model when running ≥5 parallel agents Source 27 Source 31	Review bottleneck — agent throughput exceeds human verification capacity	Orchestrator mental model: front-load spec, back-load review, treat agents as async PR-producing workers Source 27

CEMENT Brick

If you ship an agentic workflow without budget caps, idempotent tools, a deterministic orchestrator, propagated identity, and a HITL gate on irreversible actions, then your first real-traffic incident will be unrecoverable, because the same autonomy and non-determinism that make agents useful turn every missing guardrail into a load-bearing failure mode — and unlike a stateless API, you cannot roll back the actions an agent has already taken in the world Source 9 Source 21 Source 17.

Sources

Build, Reuse, or Hybrid? How Orchestration Powers Agentic AI
IBM Technology · https://www.youtube.com/watch?v=tNQPNBQC5kg
How to Pass Context in an Agentic AI Flow
IBM Technology · https://www.youtube.com/watch?v=UC4vDpSJCkM
Engineering Docs
AI Agent Architecture: Tool Calling, Multi-Agent Systems, Guardrails, and Production Patterns
How AI Agents and Decision Agents Combine Rules & ML in Automation
IBM Technology · https://www.youtube.com/watch?v=-mldKsBR0UM
Engineering Docs
AI Agent Architecture: Tool Calling, Multi-Agent Systems, Guardrails, and Planning Strategies
Enhancing AI Agents Through Fine Tuning & Model Customization
IBM Technology · https://www.youtube.com/watch?v=aQuCTWhiiPg
Engineering Docs
Distributed System Design: Caching, Sharding, Load Balancing, and Consistency Models
Engineering Docs
AI Agents & Tool Use: Architecture, Planning, Memory, and Production Patterns
Risks of Agentic AI: What You Need to Know About Autonomous AI
IBM Technology · https://www.youtube.com/watch?v=v07Y4fmSi6Y
behind-the-streams-real-time-recommendations-for-live-events-e027cb313f8f
Netflix Tech Blog · https://netflixtechblog.com/behind-the-streams-real-time-recommendations-for-live-events-e027cb313f8f
Behind the Streams: Real-Time Recommendations for Live Events Part 3
Netflix Tech Blog · https://netflixtechblog.com/behind-the-streams-real-time-recommendations-for-live-events-e027cb313f8f?source=rss----2615bd06b42e---4
What Are AI Identities? Understanding Agentic Systems & Governance
IBM Technology · https://www.youtube.com/watch?v=AuV62XbiZcw
Engineering Docs
AI Agents & Tool Use: Architecture, Planning, Memory, Guardrails, and Production Patterns
Building Tools for AI Agents
MLOps Clips · https://www.youtube.com/watch?v=ov-HUEVrgOk
Engineering Docs
LLM-Driven Structured Form Updates: Preventing Fabrication in JSON-Patch Systems
Engineering Docs
Agentic AI Security Guide | IBM
How to Monitor AI Agents in Commerce Systems
Expert: Mehdi Daoudi · https://www.catchpoint.com/blog/how-to-monitor-ai-agents-in-commerce-systems
AI Dev 25 x NYC Nicholas Clegg: How AWS Moved Beyond Orchestration with Strands SDK
DeepLearning.AI · https://www.youtube.com/watch?v=lVgrowsPASU
Engineering Docs
Distributed System Design Fundamentals: Load Balancing, Resilience, Service Architecture, and Consistency
AI agents in action: From pilots to outcomes at scale
IBM · https://www.youtube.com/watch?v=v-Q0hyKl88I
Why AI Agents Need A Human in the Loop Now
IBM Technology · https://www.youtube.com/watch?v=cmEJ-5zYKHA
Uber: Leading engineering through an agentic shift - The Pragmatic Summit
The Pragmatic Engineer · https://www.youtube.com/watch?v=i1tZN41VKcE
Engineering Docs
AI Agents: Architecture, Tool Calling, Multi-Agent Systems, Guardrails, and Planning Strategies
LLM vs. SLM vs. FM: Choosing the Right AI Model
IBM Technology · https://www.youtube.com/watch?v=AVQzG2MY858
Engineering Docs
Martin-Kleppmann---Designing-Data-Intensive-Applications_-O’Reilly-Media-(2017).pdf
Engineering Docs
AI Agents & Tool Use: Architecture, Safety, and Production Patterns
The future of agentic coding: conductors to orchestrators
Expert: Addy Osmani · https://addyosmani.com/blog/future-agentic-coding/
Orchestrator Agents & MCP: How AI Agents Drive Automation
IBM Technology · https://www.youtube.com/watch?v=Ons1Fv3IE4U
Building Decision Agents with LLMs & Machine Learning Models
IBM Technology · https://www.youtube.com/watch?v=mRkJTXDromw
Designing AI Decision Agents with DMN, Machine Learning & Analytics
IBM Technology · https://www.youtube.com/watch?v=Wtpwva8t1vs
Your AI coding agents need a manager
Expert: Addy Osmani · https://addyosmani.com/blog/coding-agents-manager/
Building an AI Agent Governance Framework: 5 Essential Pillars
IBM Technology · https://www.youtube.com/watch?v=5hK7pQsvpy0
Securing Agentic Frameworks
IBM · https://www.youtube.com/watch?v=MLPMpE4wJTQ

Built, then written

Tested on my own homelab before publishing — a four-architecture cluster (ARM · AMD ROCm · NVIDIA CUDA · Apple Silicon) running this blog, the RAG pipeline, and a sovereign research copilot. Built and tested before it's written — refined as I learn. See the platform →

Rafael Lopes

Production AI Engineer in Vancouver, BC. Brazilian. Builds and ships production AI on a self-hosted homelab — RAG pipelines, distributed LLM inference, web performance, and platform engineering.

Machine-readable brief — Rafael Lopes

Agentic Systems in Production: Patterns That Survive Real Traffic

The Problem

The Shape

How It Works

When It Breaks

CEMENT Brick

Sources

Related posts

Machine-readable brief — Rafael Lopes

Agentic Systems in Production: Patterns That Survive Real Traffic

The Problem

The Shape

How It Works

When It Breaks

CEMENT Brick

Sources

Related posts

WebMCP: Making Your Website Callable, Not Just Crawlable

Sitemaps for Agent Discovery

AI-Aware robots.txt: Let the Right Agents In