2026-06-05 · 3 min read · Rafael Lopes

AI Engineer in Vancouver, BC — Production AI, Built in the Open

What I Build I'm Rafael Lopes — "Rafa" — a production AI engineer based in Vancouver, British Columbia. I don't write about AI from the sidelines; I ship it....

AI vancouver production-ai rag homelab consulting

What I Build

I'm Rafael Lopes — "Rafa" — a production AI engineer based in Vancouver, British Columbia. I don't write about AI from the sidelines; I ship it. The systems below all serve live traffic from a self-hosted cluster in one room:

A hybrid-RAG pipeline over 69,000+ curated technical chunks (BM25 + TF-IDF + weighted RRF + cross-encoder rerank), with an automated quality gate that strips fabricated quotes before anything publishes.
Distributed LLM inference across four compute architectures — ARM, AMD ROCm, NVIDIA CUDA, and Apple Silicon — pooling memory over the llama.cpp RPC protocol for models too large for one GPU.
exaflop.ca, a sovereign research copilot for Canadian HPC — every byte of the inference path stays local, with a live ledger proving zero foreign hops per query.

The Stack

The whole platform is documented, not described:

How the briefs are made → the retrieval → synthesis → quality-gate → publish pipeline, with the real numbers.
The infrastructure → a four-architecture K3s homelab, GitOps via Argo CD, Cloudflare Tunnel + Zero Trust at the edge — no cloud compute.
A from-scratch RAG build → the actual BM25/TF-IDF/RRF code and measured retrieval quality.

The Daily Brief

Every weekday I publish a cross-domain engineering brief — AI, web performance, system design, security, and the career arc — synthesized from the corpus, cited to source, and shipped through the same quality gate. The archive is the proof of consistency: nobody fakes a dated, cited, cross-domain brief every working day.

The Infrastructure

No managed Kubernetes, no hosted CI, no hyperscaler in the data path. A Raspberry Pi runs the K3s control plane; an AMD-ROCm workstation does the GPU heavy lifting; an x86 box self-hosts GitLab and the registry; a Mac M3 Max joins as an RPC peer. Every change goes git → CI → Argo CD → live. The platform that runs this blog is the same one that runs the research copilot.

Available For

Vancouver-based and remote-friendly. Open to:

Consulting on production RAG, LLM inference, and AI platform/SRE work.
Speaking on sovereign/local-first AI, web performance for AI consumers, and homelab-scale inference.
Collaboration with teams shipping real AI infrastructure who want the receipts, not the hype.

Teaching by doing — production AI, not commentary. The system is the proof.

FAQ

Who is the AI engineer in Vancouver behind this site? Rafael Lopes ("Rafa") — a production AI engineer based in Vancouver, British Columbia. He builds and ships RAG pipelines, distributed LLM inference, and a sovereign research copilot on a self-hosted homelab, and documents the results in the open.

What does a production AI engineer do? Builds AI systems that serve real traffic — retrieval pipelines, LLM inference, quality gates, and the platform/SRE work to run them — rather than writing about AI from the sidelines. Here, every claim links to a live system or a measured number.

What AI does Rafael Lopes build? Hybrid retrieval (BM25 + TF-IDF + weighted RRF + cross-encoder rerank), distributed LLM inference across four compute architectures over the llama.cpp RPC protocol, and exaflop.ca — a sovereign, local-first research copilot for Canadian HPC.

Where can I read more? The daily cross-domain engineering brief, the how-it-works pipeline, and the infrastructure write-up — all linked below and at blog.r-lopes.com.

Sources

How the briefs are made
the RAG + quality-gate pipeline · https://blog.r-lopes.com/how-it-works
The platform
the four-architecture homelab · https://blog.r-lopes.com/infra
Exaflop — a sovereign research copilot
zero-foreign-hop AI, built in Vancouver · https://exaflop.ca

Built, then written

Tested on my own homelab before publishing — a four-architecture cluster (ARM · AMD ROCm · NVIDIA CUDA · Apple Silicon) running this blog, the RAG pipeline, and a sovereign research copilot. Built and tested before it's written — refined as I learn. See the platform →

Rafael Lopes

Production AI Engineer in Vancouver, BC. Brazilian. Builds and ships production AI on a self-hosted homelab — RAG pipelines, distributed LLM inference, web performance, and platform engineering.

Machine-readable brief — Rafael Lopes

AI Engineer in Vancouver, BC — Production AI, Built in the Open

What I Build

The Stack

The Daily Brief

The Infrastructure

Available For

FAQ

Sources

Related posts

Machine-readable brief — Rafael Lopes

AI Engineer in Vancouver, BC — Production AI, Built in the Open

What I Build

The Stack

The Daily Brief

The Infrastructure

Available For

FAQ

Sources

Related posts

Building a RAG Pipeline From Scratch

WebMCP: Making Your Website Callable, Not Just Crawlable

Sitemaps for Agent Discovery