Cache Invalidation for AI Consumers: Keeping Agent-Facing Endpoints Fresh Without Busting the CDN Edge

The Problem Agent-facing endpoints — the routes that LLM tool calls, retrieval pipelines, and autonomous agents hit dozens of times per task — sit awkwardly...

pattern

The Problem

Agent-facing endpoints — the /api/* routes that LLM tool calls, retrieval pipelines, and autonomous agents hit dozens of times per task — sit awkwardly between two cache models. Human-facing HTML can tolerate a 60-second stale window because a person won't notice; an agent reasoning over a chain of five tool calls absolutely will, because stale data in call #2 poisons every downstream inference. The naive fix — Cache-Control: no-store everywhere — collapses your edge hit ratio and pushes every agent request to origin, which is the failure mode CDNs were built to prevent Source 2.

The Shape

// app/api/agent/[resource]/route.ts
import { NextRequest, NextResponse } from 'next/server'
import { revalidateTag } from 'next/cache'

export const dynamic = 'force-dynamic'

const FRESH = 30
const SWR = 300

export async function GET(req: NextRequest, { params }: { params: { resource: string } }) {
  const tag = `agent:${params.resource}`
  const etag = await computeEtag(params.resource)

  if (req.headers.get('if-none-match') === etag) {
    return new NextResponse(null, {
      status: 304,
      headers: {
        'Cache-Control': `public, max-age=${FRESH}, stale-while-revalidate=${SWR}`,
        'ETag': etag,
        'Vary': 'Accept, X-Agent-Consumer',
        'X-Cache-Tag': tag,
      },
    })
  }

  const data = await loadResource(params.resource, { tag })

  return NextResponse.json(data, {
    headers: {
      'Cache-Control': `public, max-age=${FRESH}, stale-while-revalidate=${SWR}`,
      'ETag': etag,
      'Vary': 'Accept, X-Agent-Consumer',
      'X-Cache-Tag': tag,
      'X-Deployment-Id': process.env.NEXT_DEPLOYMENT_ID ?? 'dev',
    },
  })
}

// app/api/invalidate/route.ts
export async function POST(req: NextRequest) {
  const secret = req.headers.get('x-invalidate-secret')
  if (secret !== process.env.INVALIDATE_SECRET) {
    return new NextResponse('forbidden', { status: 403 })
  }
  const { tags } = (await req.json()) as { tags: string[] }
  for (const t of tags) revalidateTag(t)

  await fetch('https://api.cloudflare.com/client/v4/zones/' + process.env.CF_ZONE + '/purge_cache', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.CF_TOKEN}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ tags }),
  })

  return NextResponse.json({ purged: tags })
}

async function computeEtag(resource: string): Promise<string> {
  const row = await db.query('SELECT updated_at, version FROM resources WHERE id = $1', [resource])
  return `"${row.version}-${row.updated_at.getTime()}"`
}

How It Works

The contract has three moving parts: a short max-age paired with a long stale-while-revalidate, a content-addressed ETag, and tag-keyed purges from the writer side. max-age=30, stale-while-revalidate=300 tells the edge to serve cached bytes for 30 seconds with zero origin contact, then for the next 300 seconds serve stale bytes immediately while revalidating asynchronously — user-facing latency stays flat during refresh Source 2. For agents this matters double: an LLM tool call that blocks on a cold origin fetch burns wall-clock against the model's reasoning budget, not just user patience.

The ETag is the agent's escape valve from max-age. When an agent has a hot loop hitting the same resource, it sends If-None-Match and the edge returns 304 in single-digit milliseconds without round-tripping the body. The tag — agent:${resource} — is what writers grab to invalidate. revalidateTag is Next.js's mechanism for blowing away just the entries that depend on a given key, and the framework prioritizes availability over strict consistency: cache write failures still serve the response, and the next request triggers a fresh render Source 4.

The Vary: Accept, X-Agent-Consumer header is the non-obvious lever. Agents and humans usually want the same resource shaped differently — JSON for the agent, HTML or RSC for the browser. Caching them under one key produces the HTML/RSC inconsistency failure mode where mismatched payloads collide during client-side navigation Source 4. Vary partitions the cache so an invalidation on one variant doesn't strand the other with a different TTL.

Cross-deployment skew is the last hazard. Rolling out a new build mid-flight will serve a mix of old and new payloads from the edge. Setting deploymentId (mirrored here as X-Deployment-Id) triggers a hard navigation on build-ID change so agents and clients re-fetch consistent content Source 4.

                        write (DB)
                            │
                            ▼
                     ┌──────────────┐
   POST /invalidate  │  origin app  │  revalidateTag('agent:x')
       ──────────►   │  (Next.js)   │  ───────────────────────►
                     └──────┬───────┘            │
                            │                    ▼
                            │           Cloudflare purge by tag
                            ▼                    │
                 ┌──────────────────┐ ◄──────────┘
   agent GET ──► │  CDN edge (PoP)  │  max-age=30, swr=300
                 └──────────────────┘  Vary: Accept, X-Agent-Consumer
                            │
                  304 (ETag match)  or  200 (fresh body)

When It Breaks

Condition	What happens	Use instead
Agent loop polls faster than `max-age=30`	Edge serves identical bytes; no freshness signal reaches the loop	Drop `max-age` to 5s; let `stale-while-revalidate` absorb the rest Source 2
HTML and JSON variants cached with different TTLs	Client-side navigation shows mismatched content Source 4	Single TTL across variants; rely on `Vary` to partition
Writer can't reach the purge endpoint	Tag stays alive; readers see stale data until `max-age` expiry	Treat origin `revalidateTag` as authoritative; CDN purge as best-effort backup Source 4
Rolling deploy mid-request	Edge mixes old + new payloads across the same agent task	Set `deploymentId`; force hard navigation on build-ID change Source 4
Service backed by legacy Kubernetes Endpoints with >1000 pods	Endpoints object truncates to 1000; some replicas never receive purge fan-out	Migrate clients to EndpointSlice Source 1 Source 3
Last-write-wins on concurrent invalidations	Clock skew silently drops a purge	Tag with monotonic version, not wall-clock timestamp Source 2
`R=1` read replica behind the origin	Strongly-consistent read needed after purge returns stale	Use `R=majority` for the post-invalidate read path Source 2
Multi-port Service exposes both human and agent paths under one name	Unnamed port collisions block selector routing	Name ports explicitly (`http`, `agent-json`) per the Service spec Source 1 Source 3

CEMENT Brick

If you serve agent-facing endpoints with the same Cache-Control profile you'd use for human HTML, then a single stale tool-call response will poison every downstream inference in a chained agent task, because LLMs cannot distinguish "this data is 60 seconds old" from "this data is wrong" — the only defenses are short max-age paired with stale-while-revalidate for edge offload Source 2, ETag-driven 304s for hot loops, tag-keyed revalidateTag purges at write time Source 4, and Vary partitioning so the agent JSON variant and the human HTML variant invalidate independently without colliding Source 4.

Sources

Engineering Docs
Concepts
Engineering Docs
Distributed System Design Fundamentals: Caching, Sharding, Consistency, and Resilience
Engineering Docs
Service
How revalidation works in Next.js
Next.js Docs · https://nextjs.org/docs/app/guides/how-revalidation-works

Built, then written

Tested on my own homelab before publishing — a four-architecture cluster (ARM · AMD ROCm · NVIDIA CUDA · Apple Silicon) running this blog, the RAG pipeline, and a sovereign research copilot. Built and tested before it's written — refined as I learn. See the platform →

Rafael Lopes

Production AI Engineer in Vancouver, BC. Brazilian. Builds and ships production AI on a self-hosted homelab — RAG pipelines, distributed LLM inference, web performance, and platform engineering.

Machine-readable brief — Rafael Lopes

Cache Invalidation for AI Consumers: Keeping Agent-Facing Endpoints Fresh Without Busting the CDN Edge

The Problem

The Shape

How It Works

When It Breaks

CEMENT Brick

Sources

Related posts

Machine-readable brief — Rafael Lopes

Cache Invalidation for AI Consumers: Keeping Agent-Facing Endpoints Fresh Without Busting the CDN Edge

The Problem

The Shape

How It Works

When It Breaks

CEMENT Brick

Sources

Related posts

Schema.org Is Now the API Contract Your AI Agents Read