Research & exploration — not a production case study. The measurements and figures below are an illustrative model of how agent-mediated traffic would behave, used to reason about the pattern. They are not benchmarks I ran on my own production systems. External facts are cited and linked; the numbers are the hypothesis, not the receipt.
The Problem
When a web performance engineer optimizes payload size, they think in kilobytes: tree-shake the bundle, compress with Brotli, lazy-load below the fold. When an AI agent consumes your API, the unit changes. The agent's constraint isn't bandwidth — it's context window. A product API returning 2,000 tokens of nested JSON wastes context that the agent needs for reasoning, comparison, and response generation. At $0.50-$15 per million input tokens (depending on model), every unnecessary field has a literal dollar cost. Netflix discovered a version of this problem with tokenizer alignment: "tiny differences in normalization, special token handling, or chat templating can yield different token boundaries — exactly the kind of mismatch that shows up later as inexplicable quality regressions." The same principle applies to your API — what you send determines how the agent tokenizes, and excess fields create noise that degrades answer quality.
The Shape
// token-lean-transform.js
// Transforms a full product record into an agent-optimized payload
const AGENT_FIELDS = new Set([
'sku', 'name', 'price', 'currency', 'availability',
'description_short', 'category', 'image_url', 'last_updated',
'rating_avg', 'rating_count',
]);
function toAgentPayload(product) {
const lean = {};
for (const key of AGENT_FIELDS) {
const val = product[key];
// Strip nulls, undefined, empty strings, empty arrays
if (val === null || val === undefined || val === '' ||
(Array.isArray(val) && val.length === 0)) {
continue;
}
lean[key] = val;
}
// Flatten nested price objects
if (!lean.price && product.offers?.price) {
lean.price = product.offers.price;
lean.currency = product.offers.priceCurrency || 'USD';
}
// Cap description to reduce token waste
if (lean.description_short && lean.description_short.length > 200) {
lean.description_short = lean.description_short.slice(0, 197) + '...';
}
// Availability as boolean, not schema.org URL
if (typeof lean.availability === 'string') {
lean.availability = lean.availability.includes('InStock');
}
return lean;
}
function estimateTokens(obj) {
// GPT-family: ~4 chars per token for JSON
return Math.ceil(JSON.stringify(obj).length / 4);
}
function validateTokenBudget(payload, budget = 500) {
const tokens = estimateTokens(payload);
return {
tokens,
withinBudget: tokens <= budget,
utilization: (tokens / budget).toFixed(2),
};
}
export { toAgentPayload, estimateTokens, validateTokenBudget };
How It Works
The pattern has three layers: field selection, null stripping, and shape flattening.
Field selection is the biggest lever. A typical e-commerce product object has 40-80 fields: internal IDs, audit timestamps, warehouse codes, variant matrices, rich HTML descriptions, multiple image sizes, related product arrays. An agent doing product comparison needs about 10. The AGENT_FIELDS set is the allowlist — everything else is dropped before serialization.
Null stripping matters because LLMs have a completion instinct. When the model sees "children_ages": null in context, the autoregressive generation process wants to complete it — fabricating values like [8, 12] because null feels unfinished. Removing the field entirely eliminates the completion target. This is the token-budget equivalent of removing unused CSS — it's not just wasted bytes, it actively causes bugs.
Shape flattening converts nested objects into flat key-value pairs. A nested offers.price.amount.value structure costs more tokens than a flat price: 190.00 because JSON nesting adds braces, colons, and key repetition at every level.
The middleware that serves this:
// Express middleware — agent-aware response transform
function agentResponseMiddleware(req, res, next) {
const isAgent = /^(GPTBot|ClaudeBot|PerplexityBot|Googlebot-Extended)/
.test(req.headers['user-agent'] || '')
|| req.headers['accept']?.includes('application/x-ndjson');
if (!isAgent) return next();
const originalJson = res.json.bind(res);
res.json = (data) => {
const products = Array.isArray(data) ? data : [data];
const lean = products.map(toAgentPayload);
const budget = validateTokenBudget(
lean.length === 1 ? lean[0] : lean,
lean.length * 500
);
res.setHeader('X-Token-Count', String(budget.tokens));
res.setHeader('X-Token-Utilization', budget.utilization);
res.setHeader('Cache-Control', 'public, max-age=60, stale-while-revalidate=300');
originalJson(lean.length === 1 ? lean[0] : lean);
};
next();
}
When It Breaks
| Condition | What happens | Use instead |
|---|---|---|
| Agent needs variant data (sizing, color) | Lean payload drops variants → agent can't answer "is this in size 11?" | Add variants_summary field: "sizes_available": [9, 10, 11, 12] |
| Agent comparing technical specs | 10 fields too few for deep comparison | Expose a ?detail=full query param that returns 25 fields at ~300 tokens |
| High-cardinality catalog queries (50+ products) | 50 products near budget | Paginate at 20, add "total": 342, "page": 1 to response envelope |
| Product has critical legal disclaimers | Stripping description removes regulatory text | Add disclaimer to AGENT_FIELDS for regulated categories |
| Agent caches your response and price changes | Lean response has no version/ETag — agent doesn't know it's stale | Add ETag header + last_updated field (already included) |
CEMENT Brick
If your product API returns 3,200 tokens when the agent needs 85, then you're charging the AI agent a large cost premium per product lookup instead of a tiny one — and the agent's orchestrator will optimize that away by switching to your competitor who returns less noise.