2026-06-25 · 8 min read · Rafael Lopes

Attribute LCP Per Phase Before You Touch Code

2026-06-25 (Thu) · Engineering brief

Lede

Today's strongest cross-domain signal is that durable systems are won or lost at the seams between stages, not inside any single component. In both Data Engineering streaming architectures and AI / ML production pipelines, value decays the moment data goes stale or a model drifts, so the discipline is instrumenting each hop — ingest, enrich, analyze, infer, render. Web Performance reinforces the same lesson: Largest Contentful Paint is not one number but a sum of sequential phases, each separately optimizable.

7 Domains

AI / ML — Deployment is the halfway mark, not the finish line

Teams routinely celebrate a model that clears the held-out test set and assume the work is done, but live traffic surfaces a second batch of failures — concept drift, data drift, and shifting input distributions. The production discipline is treating the trained model as one component in a longer cycle of monitoring, error analysis, and retraining. The hard problems begin after the first deploy, not before it.

"when you deploy a system for the first time you're maybe about halfway to the finish line because it's often only after you turn on live traffic that you then learn the second half of the important lessons needed in order to get the system to perform well" — Source 3 — ML project life cycle

For a staff-plus engineer shipping inference on a checkout-driven stack, budget at least as much capacity for post-launch monitoring and retraining as for the initial model build.

Web Performance — LCP is a sum of phases, not a single timer

On the 2025 web, images still dominate the largest contentful element (85.3% of desktop pages), and the metric decomposes into TTFB, resource load delay, load duration, and element render delay. The mobile gap persists — slower networks and weaker devices nearly double the rate of "poor" experiences versus desktop. Knowing which phase dominates is what makes optimization tractable rather than guesswork.

"Currently, 74% of desktop pages achieve a “good” LCP score compared to 62% on mobile" — Source 20 — Web Almanac 2025 LCP

For a staff-plus engineer owning RUM on a checkout funnel, attribute LCP per phase before touching code, because shaving TTFB and shaving render delay are entirely different fixes.

System Design — Pipeline coupling makes one module's change another's regression

Real systems are rarely a single model behind an API; they are chains of components where an upstream change silently degrades a downstream one. A voice-activity-detection module that clips audio differently will alter the input distribution the downstream recognizer was tuned for, with no code change in the recognizer itself. The architectural answer is per-component metrics, not just end-to-end ones.

"when you have two such modules working together changes to the first module may affect the performance of the second module as well" — Source 18 — ML pipeline monitoring

For a staff-plus engineer designing multi-stage services, instrument input and output metrics at every hop so a regression's blast radius is observable at its source.

Cloud & Infrastructure — Quantization calibration is a tunable knob, not a default

Serving large models on shared accelerators increasingly means INT8 / W4A8 weight-and-activation quantization, where calibration data quality directly governs whether accuracy survives compression. The guidance is concrete: start small on calibration samples and grow only if accuracy regresses, and calibrate with the template the model was trained on. These are operational dials, not academic ones.

"Start with 512 samples for calibration data (increase if accuracy drops)" — Source 11 — INT8 W4A8 quantization

For teams shipping inference on shared GPU pools, treat calibration set size and sequence length as first-class deployment parameters and regression-test accuracy after each quantization change.

Data Engineering — Streaming exists to beat staleness

A single 737 generates roughly 20 terabytes per hour, and the value of that data falls sharply with age — the entire point of a streaming architecture is to capture worth before it decays. The processor stage earns its keep by filtering noise, enriching raw readings with context (location, machine, business state), then analyzing for patterns. Origin, processor, destination is the spine every real-time system hangs off.

"the key value point with a streaming architecture is to avoid the stale" — Source 6 — Real-time data streaming

For a staff-plus engineer building event pipelines on a checkout stack, push enrichment and filtering as close to ingest as possible so downstream consumers never pay for stale or context-free records.

Security — Validation belongs inside the deployment pipeline

The DevOps evidence base treats security not as a release-gate afterthought but as an automated function of the pipeline itself — "shift left" means every change is validated as it flows toward production. This sits alongside test automation, deployment automation, and trunk-based development as a predictor of high-performing teams. Security that lives outside the pipeline is security that gets skipped under deadline pressure.

"shift left on security is all security testing and validation included as a function of the deployment pipeline" — Source 15 — 5 DevOps best practices

For a staff-plus engineer owning CI/CD on a transaction-heavy stack, wire security scans into the same single-button pipeline as tests, so a failing check blocks a merge the way a failing unit test does.

Engineering Career — Project selection outranks model selection

The scarcest senior skill is not training a better network but choosing which problem to spend months on, because candidate projects vary in value by an order of magnitude. Brainstorming a dozen ideas and ranking them by expected impact before committing resources is what separates leverage from busywork. The model is a general-purpose tool; the judgment is in where you point it.

"picking the right project to work on is one of the most rare and valuable skills in ai today" — Source 16 — Scoping ML projects

For a staff-plus engineer setting roadmap, spend real time scoring options on value and resource cost before writing the first line, since a 10x project beats a perfectly executed 1x one.

Cross-Cuts

AI / ML × Data Engineering

The non-obvious bridge is that streaming and ML pipelines share the same failure surface: cascading drift across stages. A streaming processor's enrich-then-analyze flow Source 6 — Real-time data streaming is structurally identical to an ML pipeline where an upstream module's output shift silently degrades a downstream model Source 18 — ML pipeline monitoring. Both demand per-stage metrics rather than a single end-to-end score, because staleness and concept drift each manifest mid-chain. The same logic now reaches the database layer, where an ML-powered query optimizer learns from each execution to refine query paths rather than re-running a static cost model Source 19 — Data virtualization.

System Design × Web Performance

Both domains converge on phase decomposition as the unit of optimization. LCP is explicitly the sum of TTFB, resource load delay, load duration, and render delay Source 20 — Web Almanac 2025 LCP, mirroring how an ML pipeline is a sequence of independently-monitored modules whose coupling propagates regressions Source 18 — ML pipeline monitoring. In both cases an aggregate number hides which stage is at fault, so the engineering move is identical: attribute the total to its phases before optimizing. A system designed for stage-level observability is also a system whose user-facing latency you can actually diagnose.

Enterprise System Graph

flowchart LR
 A[Origin/Sensor<br/>MQTT ingest] --> B[Stream Processor<br/>filter+enrich]
 B --> C[ML Inference<br/>concept-drift monitor]
 C --> D[Query Optimizer<br/>Db2 ML path]
 D --> E[Prediction Server<br/>API call]
 E --> F[LCP<br/>TTFB+render delay]

Today's Practitioner Action

Try this: pick one user-facing latency number you own — LCP on a key page, or end-to-end inference latency — and spend 30 minutes decomposing it into its sequential phases. For LCP, wire a PerformanceObserver to attribute TTFB, resource load delay, load duration, and render delay separately; for an inference path, log input and output metrics at each pipeline stage. You are buying the ability to see which hop owns the regression, exactly the seam-level observability today's Lede argues is where systems are won.

Sources

What Is Real-Time Data Streaming? AI & Machine Learning Applications
IBM Technology · https://www.youtube.com/watch?v=aBIxpJ1_EyY
#2 MLOps Specialization Course 1, Week 1, Lesson 2
DeepLearning.AI · https://www.youtube.com/watch?v=e69ZWbbsGng
What Is Real-Time Data Streaming? AI & Machine Learning Applications
IBM Technology · https://www.youtube.com/watch?v=aBIxpJ1_EyY
INT8 W4A8 — Best Practices
vLLM / llm-compressor docs · https://docs.vllm.ai
5 DevOps Best Practices
Continuous Delivery · https://www.youtube.com/watch?v=rcBFpwaB7Qk
#36 MLOps Specialization Course 1, Week 3, Lesson 12
DeepLearning.AI · https://www.youtube.com/watch?v=UEMMOdFbT94
#8 MLOps Specialization Course 1, Week 1, Lesson 8
DeepLearning.AI · https://www.youtube.com/watch?v=79UqdjnPEZ0
Data Virtualization in Data Fabric
IBM Technology · https://www.youtube.com/watch?v=2XB4UaBIvNI
web_almanac_2025_en.pdf
HTTP Archive Web Almanac 2025 · https://almanac.httparchive.org/en/2025/performance

Built, then written

Tested on my own homelab before publishing — a four-architecture cluster (ARM · AMD ROCm · NVIDIA CUDA · Apple Silicon) running this blog, the RAG pipeline, and a sovereign research copilot. Built and tested before it's written — refined as I learn. See the platform →

Rafael Lopes

Production AI Engineer in Vancouver, BC. Brazilian. Builds and ships production AI on a self-hosted homelab — RAG pipelines, distributed LLM inference, web performance, and platform engineering.

GitHub

FasterCapital

Exaflop

Machine-readable brief — Rafael Lopes