You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update Post 1 with final benchmark data and significance results
- Add OpenShift 4.21, Strimzi 0.51.0 (Kafka 3.9), Vault 2.0.0 to test
environment table
- Replace multi-topic latency tables with final-run E2E data across all
three scenarios (baseline, proxy-no-filters, encryption)
- Add significance narrative for 10-topic results: proxy publish latency
below noise, encryption E2E p99 paradoxically 9 ms lower than baseline
- Add 100-topic tail finding: 99.9th percentile of per-window p99 is 750 ms
for direct Kafka vs ~506 ms via proxy (-32%, p<0.001), interpreted as
proxy serialisation smoothing bursty consumer delivery
- Update CPU sizing coefficient from 10 mc/MB/s to 35 mc/MB/s (conservative,
from single-partition measurement); update worked examples throughout
- Remove FIXME comment; update TL;DR to reflect final numbers
Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Copy file name to clipboardExpand all lines: _posts/2026-05-26-benchmarking-the-proxy.md
+46-41Lines changed: 46 additions & 41 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,11 +13,10 @@ There's a practical question underneath the hunch too. The most common thing ope
13
13
14
14
So we stopped saying "it depends", and got off the fence: we built something you can run **yourselves** on your own infrastructure with your own workload, and measured it. Here are some representative numbers from ours.
15
15
16
-
<!-- FIXME: verify all numbers against final benchmark run before publish -->
17
16
**TL;DR**:
18
-
- A passthrough proxy adds ~0.2 ms to average publish latency with no throughput impact
17
+
- A passthrough proxy adds negligible overhead: publish latency impact is below measurement noise, E2E adds ~2 ms at moderate topic rates, throughput unaffected
19
18
- Add record encryption and expect a ~25% throughput reduction and 0.2–3 ms of additional latency at comfortable rates
20
-
- The throughput ceiling scales linearly with CPU: budget 10 millicores per MB/s of total proxy traffic
19
+
- The throughput ceiling scales linearly with CPU: budget ~35 mc per MB/s of total proxy traffic (conservative; the companion post has the full sizing formula)
21
20
- The full benchmark harness is open source — run it on your own cluster for numbers that reflect your workload
22
21
23
22
## What we measured
@@ -37,10 +36,10 @@ No, we didn't run this on a laptop — it's a realistic deployment: an 8-node Op
| Kroxylicious | 0.20.0, single proxy pod, 1000m CPU limit |
43
-
| KMS | HashiCorp Vault (in-cluster) |
42
+
| KMS | HashiCorp Vault 2.0.0 (in-cluster) |
44
43
45
44
The primary workload used 1 topic, 1 partition, 1 KB messages. We chose single-partition deliberately: it concentrates all traffic on one broker, so you hit ceilings quickly and any proxy overhead is easy to isolate. We also ran 10-topic and 100-topic workloads to make sure the results hold when load is spread more realistically across brokers.
46
45
@@ -50,37 +49,43 @@ One important caveat: this Kafka cluster is deliberately untuned. We're not tryi
50
49
51
50
## The passthrough proxy: negligible overhead
52
51
53
-
Good news first. The proxy itself — with no filter chain, just routing traffic — adds almost nothing.
52
+
Good news first. The proxy itself — with no filter chain, just routing traffic — adds almost nothing. The tables below show all three scenarios side by side.
54
53
55
54
A quick note on percentiles for anyone not steeped in performance benchmarking: p99 latency is the value that 99% of requests complete within — meaning 1 in 100 requests takes longer. Averages flatter; the p99 is what your slowest clients actually experience, and it's usually the number that matters.
56
55
57
-
**10 topics, 1 KB messages (5,000 msg/s per topic):**
56
+
**10 topics, 1 KB messages (~5,000 msg/s per topic):**
58
57
59
-
| Metric | Baseline | Proxy | Delta|
60
-
|--------|----------|-------|-------|
61
-
| Publish latency avg |2.62 ms |2.79 ms |+0.17 ms (+7%) |
62
-
| Publish latency p99 |14.09 ms |15.17 ms | +1.08 ms (+8%) |
63
-
| E2E latency avg |94.87 ms |95.34 ms | +0.47 ms (+0.5%) |
64
-
| E2E latency p99 |185.00 ms |186.00 ms | +1.00 ms (+0.5%) |
65
-
|Publish rate | 5,002 msg/s | 5,002 msg/s |0|
58
+
| Metric | Baseline | Proxy (no filters) | Encryption|
**100 topics, 1 KB messages (500 msg/s per topic):**
66
+
*Negative deltas for proxy-no-filters publish latency are within measurement noise — they indicate the proxy is indistinguishable from baseline, not that it improves latency.*
68
67
69
-
| Metric | Baseline | Proxy | Delta |
70
-
|--------|----------|-------|-------|
71
-
| Publish latency avg | 2.66 ms | 2.82 ms | +0.16 ms (+6%) |
72
-
| Publish latency p99 | 5.54 ms | 6.07 ms | +0.53 ms (+10%) |
73
-
| E2E latency avg | 253.16 ms | 253.76 ms | +0.60 ms (+0.2%) |
74
-
| E2E latency p99 | 499.00 ms | 499.00 ms | 0 |
75
-
| Publish rate | 500 msg/s | 500 msg/s | 0 |
68
+
The passthrough proxy is not adding measurable per-record overhead at this rate. E2E average overhead is +2.1 ms (p<0.001), but practically negligible for any sizing decision.
76
69
77
-
**The headline: ~0.2 ms additional average publish latency. Throughput is unaffected.**
70
+
Encryption adds significant publish latency (+10 ms avg, +13.9 ms p99, p<0.001), as you'd expect for per-record AES-256-GCM. The E2E result is counterintuitive: both proxy scenarios have *lower* E2E p99 than direct Kafka (−3 ms and −11 ms respectively, both p<0.001). E2E latency includes consumer behaviour — fetch timeouts, batch accumulation, scheduling jitter. At 5k msg/s per topic, the proxy's processing of each record slightly regularises delivery timing, damping the consumer-side spikes that drive tail latency in direct Kafka.
78
71
79
-
What did I take away from this entirely unsurprising result? Not much, honestly — without filters the proxy boils the latency-sensitive path down to little more than a couple of hops through the TCP stack. We replaced a hunch with data. The remarkable part: the proxy is doing this at Layer 7. Most proxies operate on Kafka at Layer 4 — they shuffle bytes without ever understanding what those bytes mean. Kroxylicious works at Layer 7, parsing every Kafka message, yet still adds only 0.2 ms. That's the design working.
72
+
**100 topics, 1 KB messages (~500 msg/s per topic):**
80
73
81
-
The overhead holding across 10 and 100 topics makes sense for the same reason: the proxy doesn't contend between topics. Think of the proxy as independent circuits on a distribution board — switching the breaker for lights doesn't cut power to the fridge. A Kafka broker is more like the mains supply itself — every circuit draws from the same source, so heavy load anywhere reduces what's available everywhere. Topics don't contend for shared resources: throughput scales linearly across them, and the connection sweep validates it.
The end-to-end p99 figure is likely dominated by Kafka consumer fetch timeouts, as it should be. That said, it is reassuring to have a sub-ms impact on the p99.
82
+
Publish latency overhead is statistically significant at 100 topics (proxy-no-filters p99 +27%, encryption p99 +90%, both p<0.001). But publish latency at 500 msg/s per topic is a small fraction of E2E, and the E2E picture is what operators care about: average and p99 differences are within measurement noise.
83
+
84
+
**The headline: negligible passthrough overhead — throughput unaffected across all three scenarios.**
85
+
86
+
What did I take away from this? We replaced a hunch with data. The remarkable part: the proxy is doing this at Layer 7. Most proxies operate on Kafka at Layer 4 — they shuffle bytes without ever understanding what those bytes mean. Kroxylicious works at Layer 7, parsing every Kafka message, yet still adds only a few milliseconds at the E2E average. That's the design working.
87
+
88
+
The overhead staying flat across 10 and 100 topics makes sense for the same reason: the proxy doesn't contend between topics. Think of the proxy as independent circuits on a distribution board — switching the breaker for lights doesn't cut power to the fridge. A Kafka broker is more like the mains supply itself — every circuit draws from the same source, so heavy load anywhere reduces what's available everywhere. Topics don't contend for shared resources: throughput scales linearly across them, and this data validates it.
84
89
85
90
---
86
91
@@ -92,24 +97,24 @@ Ok, so let's make the proxy smarter — make it do something people actually car
92
97
93
98
So we know encryption is doing a lot of work, but to find out the real impact we need to compare it to a plain Kafka cluster (and yes, people do run Kroxylicious without filters — TLS termination, stable client endpoints, virtual clusters — but that's a different post). The table below tells us that above a certain inflection point the numbers get really, really noisy — especially in the p99 range.
94
99
95
-
**1 topic, 1 KB messages — baseline vs encryption:**
100
+
**1 topic, 1 KB messages — baseline vs encryption (selected rates from rate sweep):**
96
101
97
102
| Rate | Metric | Baseline | Encryption | Delta |
98
103
|------|--------|----------|------------|-------|
99
-
|34,000 msg/s | Publish avg |8.00 ms |8.19 ms | +0.19 ms (+2%) |
100
-
|34,000 msg/s | Publish p99 |48.65 ms |64.01 ms | +15.35 ms (+32%) |
101
-
|36,000 msg/s | Publish avg |9.38 ms |10.46 ms | +1.08 ms (+12%) |
102
-
|36,000 msg/s | Publish p99 |63.92 ms |88.98 ms | +25.06 ms (+39%) |
103
-
|37,200 msg/s | Publish avg |9.12 ms |12.19 ms | +3.07 ms (+34%) |
104
-
|37,200 msg/s | Publish p99 |74.88 ms |113.15 ms | +38.27 ms (+51%) |
104
+
|14,300 msg/s | Publish avg |5.4 ms |7.6 ms | +2.2 ms (+41%) |
105
+
|14,300 msg/s | Publish p99 |16.3 ms |19.2 ms | +2.9 ms (+18%) |
106
+
|17,100 msg/s | Publish avg |6.3 ms |8.9 ms | +2.6 ms (+41%) |
107
+
|17,100 msg/s | Publish p99 |12.5 ms |21.9 ms | +9.4 ms (+75%) |
108
+
|18,500 msg/s | Publish avg |10.5 ms |13.7 ms | +3.2 ms (+30%) |
109
+
|18,500 msg/s | Publish p99 |22.0 ms |106.0 ms | +84.0 ms (+382%) |
105
110
106
-
So we know that somewhere above 34k we're hitting a limit. Time to hunt out exactly where — enter the rate-sweep.
111
+
The table shows encryption's p99 spiking sharply at 18,500 msg/s — but that ~18k figure is roughly where the forwarding proxy itself saturates (close to the bare Kafka baseline of ~19,400). Encryption gives out earlier. The ratesweep finds exactly where.
107
112
108
113
### Throughput ceiling
109
114
110
-
A rate-sweep is exactly what it sounds like: pick a starting rate, let OMB run long enough to get a stable measurement, then step up by a fixed percentage and repeat until the system can't keep up. We defined "can't keep up" as the sustained throughput dropping by more than 5% below the target rate — at that point, something has saturated.
115
+
A rate-sweep is exactly what it sounds like: pick a starting rate, let OMB run long enough to get a stable measurement, then step up by a fixed increment and repeat until the system can't keep up. We defined "can't keep up" as the sustained throughput dropping by more than 5% below the target rate — at that point, something has saturated.
111
116
112
-
We started at 34k (right where the latency table started getting interesting) and stepped up in 5% increments. The results:
117
+
We stepped up from 8k to 22k msg/s in 700 msg/s increments, looking for where throughput drops more than 5% below target. The results:
113
118
114
119
-**Baseline**: sustained up to ~19,400 msg/s (the ceiling at RF=3 on our test cluster)
115
120
-**Encryption**: sustained up to **~14,600 msg/s**, then started intermittently saturating
@@ -145,15 +150,15 @@ Numbers without guidance aren't very useful, so here's how to translate these re
145
150
146
151
1.**Throughput budget**: encryption imposes a CPU-driven throughput ceiling. As a planning formula:
147
152
148
-
> **`proxy CPU (millicores) = 10 × total proxy throughput (MB/s)`**
153
+
> **`proxy CPU (millicores) = 35 × total proxy throughput (MB/s)`**
149
154
>
150
155
> where *total* = produce MB/s + (each consumer group's consume MB/s independently)
151
156
152
-
For a single produce:consume pair this simplifies to `20 × produce MB/s`. Fan-out multiplies: 100 MB/s produce to 3 consumer groups = 100 + 300 = 400 MB/s total → 4,000m. Add ×1.3 headroom for GC pauses and burst. Measured on AMD EPYC-Rome 2 GHz with AES-NI — calibrate on your hardware using the rate sweep.
157
+
This is a conservative estimate derived from single-partition workloads; the companion post has the full derivation and a lower bound for multi-topic workloads. For a single produce:consume pair this simplifies to `70 × produce MB/s`. Fan-out multiplies: 100 MB/s produce to 3 consumer groups = 100 + 300 = 400 MB/s total → 14,000m. Add ×1.3 headroom for GC pauses and burst. Measured on AMD EPYC-Rome 2 GHz with AES-NI — calibrate on your hardware using the rate sweep.
153
158
154
-
Worked example: 100k msg/s at 1 KB, 1 consumer group = 100 MB/s produce + 100 MB/s consume = 200 MB/s × 10 = 2,000m, plus headroom → ~2,600m (~2.6 cores).
159
+
Worked example: 100k msg/s at 1 KB, 1 consumer group = 100 MB/s produce + 100 MB/s consume = 200 MB/s × 35 = 7,000m, plus headroom → ~9,100m (~9 cores).
155
160
156
-
2.**Latency budget**: well below saturation, expect 0.2–3 ms additional average publish latency and 15–40 ms additional p99. The overhead scales with how hard you're pushing — give yourself headroom and you'll barely notice it.
161
+
2.**Latency budget**: well below saturation, expect 2–3 ms additional average publish latency and up to ~15 ms additional p99. The overhead scales with how hard you're pushing — give yourself headroom and you'll barely notice it.
157
162
158
163
3.**Scaling**: set `requests` equal to `limits` in your pod spec — this makes the CPU budget deterministic, which makes the throughput ceiling predictable. To increase throughput, raise the CPU limit. For redundancy, add proxy pods.
0 commit comments