Edge routing proxy

A multi-layer reverse proxy for upstream-resilient routing — OpenResty, Couchbase, and a TypeScript control plane. Serves ~10M requests/day in production from a 2 vCPU / 2 GB node.

status: shipped
started: 2024-10
updated: 2026-04
tags: Backend · Systems

What

A reverse proxy that rewrites traffic in real time to keep upstream connections healthy across fast-changing endpoint state. Built for integration layers where upstream availability is unreliable — rate limits, regional routing, frequent endpoint churn — and the system above can’t tolerate that churn leaking through.

Why

The simple version of the problem: a downstream endpoint degrades or goes away, and the rest of the system can’t talk to it. The simple solution — point at a backup — is only good for hours, because whatever stressed the first endpoint tends to stress the backup the same way. So the real engineering problem is: can you route continuously, at request granularity, across a changing pool of upstreams, without breaking session state?

How

Three layers:

Edge (OpenResty / Nginx + Lua). Lua handlers on access_by_lua_block and body_filter_by_lua_block for per-request routing decisions and on-the-fly body rewriting. Picked OpenResty specifically because the alternative (a sidecar in a “real” language) adds a network hop that kills latency for this workload.
State (Couchbase). Routing tables live in Couchbase with K/V lookups on the hot path — sub-millisecond reads were the requirement, and Couchbase’s memory-first design fit better than a relational store here. The control plane owns writes; the edge is read-only.
Control plane (TypeScript / Express). Health-checks upstreams, rotates active endpoints based on health and load, publishes new routing state to Couchbase. Deliberately kept out of the request path so a control-plane outage degrades gracefully — the edge keeps serving whatever the last known routing state was.

The proxy is paired with a rotation service that health-checks and load-balances across ~50 proxy nodes — the proxy is the data plane; the rotation service decides which nodes get traffic.

Caching and fallback

Three-tier read path, designed so a Capella Couchbase outage doesn’t take the edge down:

L1 — in-process memory cache (Lua). 10 MB per worker, 60 s TTL. Lives inside the Lua worker, so there’s no network hop on a cache hit. Absorbs repeat routing lookups and takes hotkey pressure off Couchbase.
L2 — Capella Couchbase. Authoritative routing state. Sub-millisecond K/V reads on cache miss.
L3 — local document fallback. Every doc successfully fetched from Couchbase is also written to local disk. If Capella is unreachable and the L1 entry has expired, the edge falls back to the last-known-good local copy rather than failing the request. The trade-off is freshness — local fallback is as stale as the last successful read — but for this workload “slightly stale” beats “5xx.”

The useful side-effect I didn’t plan for: the L1 cache also absorbs Capella latency spikes, not just full outages. If Couchbase takes longer than a handful of ms on some read, the edge has already served the L1 hit and moved on.

What broke

Strict upstream request validation. Several upstreams reject requests based on shape, header ordering, and payload structure — not semantics. The first deploy got a high failure rate from this class of check. The fix was dynamic request-shape normalization in the Lua layer (normalizing and subtly varying request shape without changing semantics) plus jittered rotation windows to smooth traffic bursts.
WebSocket sessions. Naïve proxy rewriting broke active WebSocket connections on rotation — the client saw mid-session resets. Fixed with transparent WebSocket proxying and connection affinity: rotation only applies to new connections, existing ones stay pinned to their original upstream until they close naturally.
Hotkey contention on Couchbase surfaced under burst load. The L1 cache (above) was the fix — I measured hit rates before and after to confirm it was pulling weight rather than just adding a failure mode.

Scale / production numbers

Instance: GCP e2-small — 2 vCPU, 2 GB RAM, Debian. Deliberately small. This workload is I/O-bound, not CPU- or memory-bound, so vertical compute budget would have been wasted on a bigger box.
Kernel + nginx tuning: Linux sysctl tuned for high-throughput I/O. Nginx worker_connections and the matching kernel FD / connection ceilings both raised to 65 535 — the goal being that the application hits its own logic before it hits the OS.
Load test: 70 000 concurrent connections, 100% success rate. CPU and memory stayed flat throughout. The limit on the box was the socket / file-descriptor ceiling, and once raised it held.
Production: ~10 M requests/day sustained. Runs behind an auto-scaler so bursts add instances horizontally rather than stressing a single node.

The non-obvious constraint — even at this scale — was latency budget, not throughput. Every request eats the routing lookup, so anything above single-digit-ms at the edge was unacceptable. That shaped most of the architecture decisions: Lua at the edge over a sidecar (no extra network hop), Couchbase over Postgres (memory-first K/V instead of SQL on the hot path), and an in-process cache layer in Lua to absorb hotkey contention.

Honest limitations

Observability is thin. The control plane knows routing state; the edge knows per-request outcomes. Correlating the two is harder than it should be — I’d add proper distributed tracing as the first improvement on a rewrite.
Config is a mix of YAML and runtime Couchbase state — it works, but onboarding a new engineer to the system takes longer than it should because the source of truth isn’t obvious.
Upstream conditions change faster than any static description of the system. Numbers and behaviors from a month ago may not match today.