Token Budgets Are the New Byte Budgets

Research & exploration — not a production case study. The measurements and figures below are an illustrative model of how agent-mediated traffic would behave,...

web-perf ai-agents payload-optimization api-design

Research & exploration — not a production case study. The measurements and figures below are an illustrative model of how agent-mediated traffic would behave, used to reason about the pattern. They are not benchmarks I ran on my own production systems. External facts are cited and linked; the numbers are the hypothesis, not the receipt.

The Problem

When a web performance engineer optimizes payload size, they think in kilobytes: tree-shake the bundle, compress with Brotli, lazy-load below the fold. When an AI agent consumes your API, the unit changes. The agent's constraint isn't bandwidth — it's context window. A product API returning 2,000 tokens of nested JSON wastes context that the agent needs for reasoning, comparison, and response generation. At $0.50-$15 per million input tokens (depending on model), every unnecessary field has a literal dollar cost. Netflix discovered a version of this problem with tokenizer alignment: "tiny differences in normalization, special token handling, or chat templating can yield different token boundaries — exactly the kind of mismatch that shows up later as inexplicable quality regressions." The same principle applies to your API — what you send determines how the agent tokenizes, and excess fields create noise that degrades answer quality.

The Shape

// token-lean-transform.js
// Transforms a full product record into an agent-optimized payload

const AGENT_FIELDS = new Set([
  'sku', 'name', 'price', 'currency', 'availability',
  'description_short', 'category', 'image_url', 'last_updated',
  'rating_avg', 'rating_count',
]);

function toAgentPayload(product) {
  const lean = {};

  for (const key of AGENT_FIELDS) {
    const val = product[key];
    // Strip nulls, undefined, empty strings, empty arrays
    if (val === null || val === undefined || val === '' ||
        (Array.isArray(val) && val.length === 0)) {
      continue;
    }
    lean[key] = val;
  }

  // Flatten nested price objects
  if (!lean.price && product.offers?.price) {
    lean.price = product.offers.price;
    lean.currency = product.offers.priceCurrency || 'USD';
  }

  // Cap description to reduce token waste
  if (lean.description_short && lean.description_short.length > 200) {
    lean.description_short = lean.description_short.slice(0, 197) + '...';
  }

  // Availability as boolean, not schema.org URL
  if (typeof lean.availability === 'string') {
    lean.availability = lean.availability.includes('InStock');
  }

  return lean;
}

function estimateTokens(obj) {
  // GPT-family: ~4 chars per token for JSON
  return Math.ceil(JSON.stringify(obj).length / 4);
}

function validateTokenBudget(payload, budget = 500) {
  const tokens = estimateTokens(payload);
  return {
    tokens,
    withinBudget: tokens <= budget,
    utilization: (tokens / budget).toFixed(2),
  };
}

export { toAgentPayload, estimateTokens, validateTokenBudget };

How It Works

The pattern has three layers: field selection, null stripping, and shape flattening.

Field selection is the biggest lever. A typical e-commerce product object has 40-80 fields: internal IDs, audit timestamps, warehouse codes, variant matrices, rich HTML descriptions, multiple image sizes, related product arrays. An agent doing product comparison needs about 10. The AGENT_FIELDS set is the allowlist — everything else is dropped before serialization.

Null stripping matters because LLMs have a completion instinct. When the model sees "children_ages": null in context, the autoregressive generation process wants to complete it — fabricating values like [8, 12] because null feels unfinished. Removing the field entirely eliminates the completion target. This is the token-budget equivalent of removing unused CSS — it's not just wasted bytes, it actively causes bugs.

Shape flattening converts nested objects into flat key-value pairs. A nested offers.price.amount.value structure costs more tokens than a flat price: 190.00 because JSON nesting adds braces, colons, and key repetition at every level.

The middleware that serves this:

// Express middleware — agent-aware response transform
function agentResponseMiddleware(req, res, next) {
  const isAgent = /^(GPTBot|ClaudeBot|PerplexityBot|Googlebot-Extended)/
    .test(req.headers['user-agent'] || '')
    || req.headers['accept']?.includes('application/x-ndjson');

  if (!isAgent) return next();

  const originalJson = res.json.bind(res);
  res.json = (data) => {
    const products = Array.isArray(data) ? data : [data];
    const lean = products.map(toAgentPayload);
    const budget = validateTokenBudget(
      lean.length === 1 ? lean[0] : lean,
      lean.length * 500
    );

    res.setHeader('X-Token-Count', String(budget.tokens));
    res.setHeader('X-Token-Utilization', budget.utilization);
    res.setHeader('Cache-Control', 'public, max-age=60, stale-while-revalidate=300');

    originalJson(lean.length === 1 ? lean[0] : lean);
  };

  next();
}

When It Breaks

Condition	What happens	Use instead
Agent needs variant data (sizing, color)	Lean payload drops variants → agent can't answer "is this in size 11?"	Add `variants_summary` field: `"sizes_available": [9, 10, 11, 12]`
Agent comparing technical specs	10 fields too few for deep comparison	Expose a `?detail=full` query param that returns 25 fields at ~300 tokens
High-cardinality catalog queries (50+ products)	50 products near budget	Paginate at 20, add `"total": 342, "page": 1` to response envelope
Product has critical legal disclaimers	Stripping description removes regulatory text	Add `disclaimer` to `AGENT_FIELDS` for regulated categories
Agent caches your response and price changes	Lean response has no version/ETag — agent doesn't know it's stale	Add `ETag` header + `last_updated` field (already included)

CEMENT Brick

If your product API returns 3,200 tokens when the agent needs 85, then you're charging the AI agent a large cost premium per product lookup instead of a tiny one — and the agent's orchestrator will optimize that away by switching to your competitor who returns less noise.

Sources

The tokenizer-alignment problem
Netflix Tech Blog · https://netflixtechblog.com/100x-faster-how-we-supercharged-netflix-maestros-workflow-engine-028e9637f041

Built, then written

Tested on my own homelab before publishing — a four-architecture cluster (ARM · AMD ROCm · NVIDIA CUDA · Apple Silicon) running this blog, the RAG pipeline, and a sovereign research copilot. Built and tested before it's written — refined as I learn. See the platform →

Rafael Lopes

Production AI Engineer in Vancouver, BC. Brazilian. Builds and ships production AI on a self-hosted homelab — RAG pipelines, distributed LLM inference, web performance, and platform engineering.

Machine-readable brief — Rafael Lopes