The Problem
Single-pass LLM calls don't survive contact with production. The moment you give a model tools that mutate state — booking flights, processing refunds, opening pull requests, rerouting shipments — every property you took for granted in a stateless API breaks: retries are no longer idempotent, latency is unbounded, the action space is non-deterministic, and the failure mode is now "wrong action executed" rather than "wrong text returned" Source 2Source 16. Most production agent failures aren't model failures; they're orchestration, identity, and observability failures dressed up as model failures Source 17.
The Shape
The pattern that holds up: a deterministic orchestrator wrapping a non-deterministic reasoner, with idempotent tools, hard budget caps, and a human-in-the-loop gate on irreversible actions Source 5Source 21. Copy-paste skeleton:
import asyncio, time, uuid, logging
from dataclasses import dataclass, field
log = logging.getLogger("agent")
@dataclass
class RunBudget:
max_steps: int = 12
max_tokens: int = 100_000
max_usd: float = 2.00
deadline_s: float = 90.0
tokens_used: int = 0
usd_used: float = 0.0
steps: int = 0
started: float = field(default_factory=time.monotonic)
def check(self):
if self.steps >= self.max_steps: raise BudgetExceeded("steps")
if self.tokens_used >= self.max_tokens: raise BudgetExceeded("tokens")
if self.usd_used >= self.max_usd: raise BudgetExceeded("usd")
if time.monotonic() - self.started > self.deadline_s: raise BudgetExceeded("deadline")
class BudgetExceeded(Exception): pass
class CircuitOpen(Exception): pass
TOOL_ALLOWLIST = {"search_kb", "get_order", "draft_refund"}
HITL_REQUIRED = {"issue_refund", "send_email", "create_ticket"}
class CircuitBreaker:
def __init__(self, threshold=5, cooldown=30):
self.fail = 0; self.threshold = threshold
self.opened_at = 0; self.cooldown = cooldown
def allow(self):
if self.fail < self.threshold: return True
if time.monotonic() - self.opened_at > self.cooldown:
self.fail = self.threshold - 1
return True
return False
def record(self, ok):
if ok: self.fail = 0
else:
self.fail += 1
if self.fail == self.threshold: self.opened_at = time.monotonic()
BREAKERS = {}
async def call_tool(name, args, idempotency_key, breaker):
if name not in TOOL_ALLOWLIST:
return {"error": f"tool '{name}' not allowlisted"}
if not breaker.allow():
raise CircuitOpen(name)
for attempt in range(3):
try:
res = await asyncio.wait_for(
TOOLS[name](args, idempotency_key=idempotency_key),
timeout=5.0,
)
breaker.record(True)
return res
except (asyncio.TimeoutError, TransientError):
await asyncio.sleep((2 ** attempt) + (attempt * 0.1))
breaker.record(False)
return {"error": "tool failed after retries"}
async def hitl_gate(action, args, run_id):
approval = await approvals.request(
run_id=run_id, action=action, args=args, ttl_s=600
)
return approval.decision == "approve"
async def run_agent(user_msg, principal, budget=None):
budget = budget or RunBudget()
run_id = str(uuid.uuid4())
trace = []
state = {"messages": [{"role": "user", "content": user_msg}]}
while True:
budget.check(); budget.steps += 1
step = await llm.plan(
state, tools=list(TOOL_ALLOWLIST | HITL_REQUIRED),
principal=principal,
)
budget.tokens_used += step.usage.total_tokens
budget.usd_used += step.usage.cost_usd
trace.append({"run": run_id, "step": budget.steps, "thought": step.thought,
"action": step.action, "args": step.args})
if step.action == "final":
log.info("agent.done", extra={"run": run_id, "steps": budget.steps})
return step.answer, trace
breaker = BREAKERS.setdefault(step.action, CircuitBreaker())
idem_key = f"{run_id}:{budget.steps}:{step.action}"
if step.action in HITL_REQUIRED:
if not await hitl_gate(step.action, step.args, run_id):
state["messages"].append(
{"role": "tool", "name": step.action, "content": "denied_by_human"}
)
continue
try:
result = await call_tool(step.action, step.args, idem_key, breaker)
except (BudgetExceeded, CircuitOpen) as e:
state["messages"].append(
{"role": "tool", "name": step.action, "content": f"halt:{e}"}
)
return await llm.summarize_halt(state, reason=str(e)), trace
state["messages"].append(
{"role": "tool", "name": step.action, "content": result}
)
Every step is traced, every tool call is keyed for idempotent retry, every action that mutates the world either fails closed or requires human approval, and the loop cannot exceed its step, token, USD, or wall-clock budget Source 5Source 8Source 26.
How It Works
The agent loop itself is the ReAct pattern — observe, reason, act, repeat — wrapped around a model whose action space is constrained to a tool allowlist, with each tool described by a JSON schema the model uses for routing and parameter generation Source 13Source 23. The orchestrator, not the model, owns control flow: it counts steps, charges the budget, fans out to tools, and decides when to hand off to a human. "Separating the brain from the hands" — the model classifies and extracts, deterministic code applies the patch — is what keeps a hallucinated argument from becoming a hallucinated refund Source 15.
Idempotency is the load-bearing property. Tool calls to external APIs fail transiently; retry with exponential backoff is mandatory, but only safe when the tool checks for an existing record with the same idempotency key before creating a new one Source 5Source 8. The circuit breaker — closed, open, half-open — is the same Hystrix pattern Netflix taught the industry; in an agent context it stops a degraded downstream from burning the entire token budget on doomed retries Source 19Source 7. Bulkhead the breakers per-tool so a flaky email API doesn't poison the search path.
Identity and authorization are the part most demos skip. Agentic context is autonomous, dynamic, multi-system; the user's identity must propagate through the orchestrator, sub-agents, and MCP servers to whatever resource finally executes the write, or you create a confused-deputy problem at scale Source 2Source 33. Each agent should have a unique identity, least-privilege scoped to its task, with just-in-time provisioning for sensitive credentials and a narrow tool catalog so a compromised sub-agent has nowhere to pivot Source 12Source 12Source 16. Prompt injection through retrieved content is real — five poisoned documents can flip behavior with 90% success in published research — so the orchestration layer must validate tool args, not trust the model's claim about them Source 16.
The observability layer is non-negotiable. Catchpoint's framing — "what the AI decided / what it executed / where it broke" — is the right schema for traces, because page-load and API-latency dashboards don't tell you whether intent was actually fulfilled Source 17Source 17. Distributed trace IDs link the LLM call to every tool invocation; cost-per-task and steps-per-task are the leading indicators of orchestration regressions long before user-facing errors appear Source 8.
user ──▶ orchestrator ──▶ planner(LLM)
│ │ thought + action
│ budget/step ◀────┘
│
├──▶ allowlist check ──▶ HITL gate (if mutating)
│ │ approve/deny
├──▶ circuit breaker ──▶ tool (idempotent, timeout, retry)
│ │ result
│ trace + cost ◀───────────────┘
▼
audit log / observability
When It Breaks
| Condition | What happens | Use instead |
|---|---|---|
| Single mega-tool wraps a 40-parameter API Source 14 | Model hallucinates IDs, timestamps, unique keys; tool calls fail or mutate wrong record | Split into field-group tools with enum-constrained targets; resolve IDs server-side from natural language Source 15Source 26 |
| Free-built orchestration component dropped in without integration to identity model Source 1Source 1 | Point-to-point silo; no consistent governance, no central trace; auditability gaps | Hybrid: reuse the component but route through your orchestration layer that owns prompts, routing, evals Source 1Source 20 |
| Synchronous request-response across multi-agent handoff at live-event traffic | Thundering-herd cache expirations, retry storms, p99 collapse Source 10Source 11 | Async message bus with jittered TTLs, dead-letter queue, back-pressure, traffic prioritization for critical paths Source 7Source 19 |
| Agent given write access without HITL on irreversible actions Source 9Source 21 | "Acceleration in the wrong direction" — refunds issued, emails sent, prod data touched at machine speed | Classify actions ALLOW / ALLOW_WITH_CAPS / DENY; require approval gates on high-impact and irreversible writes Source 26Source 32 |
| LLM used as a decision agent for regulated outcomes (lending, claims) Source 4Source 29 | Inconsistent decisions, black-box reasoning, no audit trail that satisfies the regulator | Decision agent built on business rules / DMN for the deterministic call; LLM stays at the chat/extraction layer Source 29Source 30 |
| Single agent attempts the whole workflow end-to-end Source 6Source 18 | High token waste, error propagation across steps, agent stuck in loops | Multi-agent with supervisor + specialized workers; A2A handoff; or fine-tune for domain-aligned tool use Source 3Source 28 |
| Budget caps absent; model picks expensive frontier tier for every step Source 22Source 24 | Cost-per-task drifts up week-over-week; spend tied to model choice, not task complexity | Tiered routing: small model for plan-execution, frontier for the plan itself; enforce per-run USD ceiling Source 22 |
| Context window grows unbounded across multi-turn agent run Source 3Source 25 | Latency cliff, GC-style pauses, cost explosion, model loses task focus | Sliding window + summarization buffer; vector store for episodic memory retrieval Source 3 |
| "Conductor" mental model when running ≥5 parallel agents Source 27Source 31 | Review bottleneck — agent throughput exceeds human verification capacity | Orchestrator mental model: front-load spec, back-load review, treat agents as async PR-producing workers Source 27 |
CEMENT Brick
If you ship an agentic workflow without budget caps, idempotent tools, a deterministic orchestrator, propagated identity, and a HITL gate on irreversible actions, then your first real-traffic incident will be unrecoverable, because the same autonomy and non-determinism that make agents useful turn every missing guardrail into a load-bearing failure mode — and unlike a stateless API, you cannot roll back the actions an agent has already taken in the world Source 9Source 21Source 17.
Sources
- Build, Reuse, or Hybrid? How Orchestration Powers Agentic AI
- How to Pass Context in an Agentic AI Flow
- Engineering Docs
- How AI Agents and Decision Agents Combine Rules & ML in Automation
- Engineering Docs
- Enhancing AI Agents Through Fine Tuning & Model Customization
- Engineering Docs
- Engineering Docs
- Risks of Agentic AI: What You Need to Know About Autonomous AI
- behind-the-streams-real-time-recommendations-for-live-events-e027cb313f8f
- Behind the Streams: Real-Time Recommendations for Live Events Part 3
- What Are AI Identities? Understanding Agentic Systems & Governance
- Engineering Docs
- Building Tools for AI Agents
- Engineering Docs
- Engineering Docs
- How to Monitor AI Agents in Commerce Systems
- AI Dev 25 x NYC Nicholas Clegg: How AWS Moved Beyond Orchestration with Strands SDK
- Engineering Docs
- AI agents in action: From pilots to outcomes at scale
- Why AI Agents Need A Human in the Loop Now
- Uber: Leading engineering through an agentic shift - The Pragmatic Summit
- Engineering Docs
- LLM vs. SLM vs. FM: Choosing the Right AI Model
- Engineering Docs
- Engineering Docs
- The future of agentic coding: conductors to orchestrators
- Orchestrator Agents & MCP: How AI Agents Drive Automation
- Building Decision Agents with LLMs & Machine Learning Models
- Designing AI Decision Agents with DMN, Machine Learning & Analytics
- Your AI coding agents need a manager
- Building an AI Agent Governance Framework: 5 Essential Pillars
- Securing Agentic Frameworks