The Problem
Agent-facing endpoints — the /api/* routes that LLM tool calls, retrieval pipelines, and autonomous agents hit dozens of times per task — sit awkwardly between two cache models. Human-facing HTML can tolerate a 60-second stale window because a person won't notice; an agent reasoning over a chain of five tool calls absolutely will, because stale data in call #2 poisons every downstream inference. The naive fix — Cache-Control: no-store everywhere — collapses your edge hit ratio and pushes every agent request to origin, which is the failure mode CDNs were built to prevent Source 2.
The Shape
// app/api/agent/[resource]/route.ts
import { NextRequest, NextResponse } from 'next/server'
import { revalidateTag } from 'next/cache'
export const dynamic = 'force-dynamic'
const FRESH = 30
const SWR = 300
export async function GET(req: NextRequest, { params }: { params: { resource: string } }) {
const tag = `agent:${params.resource}`
const etag = await computeEtag(params.resource)
if (req.headers.get('if-none-match') === etag) {
return new NextResponse(null, {
status: 304,
headers: {
'Cache-Control': `public, max-age=${FRESH}, stale-while-revalidate=${SWR}`,
'ETag': etag,
'Vary': 'Accept, X-Agent-Consumer',
'X-Cache-Tag': tag,
},
})
}
const data = await loadResource(params.resource, { tag })
return NextResponse.json(data, {
headers: {
'Cache-Control': `public, max-age=${FRESH}, stale-while-revalidate=${SWR}`,
'ETag': etag,
'Vary': 'Accept, X-Agent-Consumer',
'X-Cache-Tag': tag,
'X-Deployment-Id': process.env.NEXT_DEPLOYMENT_ID ?? 'dev',
},
})
}
// app/api/invalidate/route.ts
export async function POST(req: NextRequest) {
const secret = req.headers.get('x-invalidate-secret')
if (secret !== process.env.INVALIDATE_SECRET) {
return new NextResponse('forbidden', { status: 403 })
}
const { tags } = (await req.json()) as { tags: string[] }
for (const t of tags) revalidateTag(t)
await fetch('https://api.cloudflare.com/client/v4/zones/' + process.env.CF_ZONE + '/purge_cache', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.CF_TOKEN}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ tags }),
})
return NextResponse.json({ purged: tags })
}
async function computeEtag(resource: string): Promise<string> {
const row = await db.query('SELECT updated_at, version FROM resources WHERE id = $1', [resource])
return `"${row.version}-${row.updated_at.getTime()}"`
}
How It Works
The contract has three moving parts: a short max-age paired with a long stale-while-revalidate, a content-addressed ETag, and tag-keyed purges from the writer side. max-age=30, stale-while-revalidate=300 tells the edge to serve cached bytes for 30 seconds with zero origin contact, then for the next 300 seconds serve stale bytes immediately while revalidating asynchronously — user-facing latency stays flat during refresh Source 2. For agents this matters double: an LLM tool call that blocks on a cold origin fetch burns wall-clock against the model's reasoning budget, not just user patience.
The ETag is the agent's escape valve from max-age. When an agent has a hot loop hitting the same resource, it sends If-None-Match and the edge returns 304 in single-digit milliseconds without round-tripping the body. The tag — agent:${resource} — is what writers grab to invalidate. revalidateTag is Next.js's mechanism for blowing away just the entries that depend on a given key, and the framework prioritizes availability over strict consistency: cache write failures still serve the response, and the next request triggers a fresh render Source 4.
The Vary: Accept, X-Agent-Consumer header is the non-obvious lever. Agents and humans usually want the same resource shaped differently — JSON for the agent, HTML or RSC for the browser. Caching them under one key produces the HTML/RSC inconsistency failure mode where mismatched payloads collide during client-side navigation Source 4. Vary partitions the cache so an invalidation on one variant doesn't strand the other with a different TTL.
Cross-deployment skew is the last hazard. Rolling out a new build mid-flight will serve a mix of old and new payloads from the edge. Setting deploymentId (mirrored here as X-Deployment-Id) triggers a hard navigation on build-ID change so agents and clients re-fetch consistent content Source 4.
write (DB)
│
▼
┌──────────────┐
POST /invalidate │ origin app │ revalidateTag('agent:x')
──────────► │ (Next.js) │ ───────────────────────►
└──────┬───────┘ │
│ ▼
│ Cloudflare purge by tag
▼ │
┌──────────────────┐ ◄──────────┘
agent GET ──► │ CDN edge (PoP) │ max-age=30, swr=300
└──────────────────┘ Vary: Accept, X-Agent-Consumer
│
304 (ETag match) or 200 (fresh body)
When It Breaks
| Condition | What happens | Use instead |
|---|---|---|
Agent loop polls faster than max-age=30 |
Edge serves identical bytes; no freshness signal reaches the loop | Drop max-age to 5s; let stale-while-revalidate absorb the rest Source 2 |
| HTML and JSON variants cached with different TTLs | Client-side navigation shows mismatched content Source 4 | Single TTL across variants; rely on Vary to partition |
| Writer can't reach the purge endpoint | Tag stays alive; readers see stale data until max-age expiry |
Treat origin revalidateTag as authoritative; CDN purge as best-effort backup Source 4 |
| Rolling deploy mid-request | Edge mixes old + new payloads across the same agent task | Set deploymentId; force hard navigation on build-ID change Source 4 |
| Service backed by legacy Kubernetes Endpoints with >1000 pods | Endpoints object truncates to 1000; some replicas never receive purge fan-out | Migrate clients to EndpointSlice Source 1Source 3 |
| Last-write-wins on concurrent invalidations | Clock skew silently drops a purge | Tag with monotonic version, not wall-clock timestamp Source 2 |
R=1 read replica behind the origin |
Strongly-consistent read needed after purge returns stale | Use R=majority for the post-invalidate read path Source 2 |
| Multi-port Service exposes both human and agent paths under one name | Unnamed port collisions block selector routing | Name ports explicitly (http, agent-json) per the Service spec Source 1Source 3 |
CEMENT Brick
If you serve agent-facing endpoints with the same Cache-Control profile you'd use for human HTML, then a single stale tool-call response will poison every downstream inference in a chained agent task, because LLMs cannot distinguish "this data is 60 seconds old" from "this data is wrong" — the only defenses are short max-age paired with stale-while-revalidate for edge offload Source 2, ETag-driven 304s for hot loops, tag-keyed revalidateTag purges at write time Source 4, and Vary partitioning so the agent JSON variant and the human HTML variant invalidate independently without colliding Source 4.
Sources
- Engineering Docs
- Engineering Docs
- Engineering Docs
- How revalidation works in Next.js