hmchangw · hmchangw · Apr 27, 2026 · Apr 21, 2026 · Apr 21, 2026 · Apr 21, 2026
diff --git a/docs/superpowers/plans/2026-04-21-load-test-messaging-workers.md b/docs/superpowers/plans/2026-04-21-load-test-messaging-workers.md
diff --git a/docs/superpowers/specs/2026-04-21-load-test-messaging-workers-design.md b/docs/superpowers/specs/2026-04-21-load-test-messaging-workers-design.md
diff --git a/docs/superpowers/specs/2026-04-24-loadgen-worker-pool-design.md b/docs/superpowers/specs/2026-04-24-loadgen-worker-pool-design.md
@@ -0,0 +1,203 @@
+# Loadgen Worker-Pool Dispatch + pprof — Design
+
+## Purpose
+
+The loadgen's actual publish rate falls materially below the target rate at
+moderate throughput. At `--rate=1000` observed actual rate is ~775 msg/s
+(~77% delivery). Root cause: the publisher runs on the `time.Ticker`'s
+goroutine serially, and `time.Ticker` drops ticks that fire while a publish
+is still in progress. Any per-publish stall (NATS write-lock contention,
+GC pause, scheduler hiccup) above the 1 ms/tick budget silently loses a
+tick.
+
+This spec fixes that by dispatching publishes to a small worker pool and
+adds opt-in pprof so future bottlenecks are diagnosable.
+
+## Scope
+
+### In scope
+
+- `Generator.Run` dispatches each tick's publish to a bounded pool of
+  goroutines. The ticker itself stays punctual.
+- New env var `MAX_IN_FLIGHT` (default `200`) caps concurrent publishes.
+  Saturation (pool full when a tick fires) is an explicit signal, not a
+  silent drop: the ticker records
+  `loadgen_publish_errors_total{reason="saturated"}` and moves on.
+- `MAX_IN_FLIGHT=0` falls back to the current serial behavior. Useful as
+  a bisection tool and a conservative default for whoever wants
+  reproducible comparisons.
+- On graceful shutdown / `ctx.Done()`, `Run` returns only after all
+  in-flight publishes drain (bounded by a small timeout).
+- New env var `PPROF_ADDR` (default `""`, meaning disabled). When set
+  (e.g. `:6060`), loadgen exposes `net/http/pprof` handlers on a
+  separate HTTP server. Never on by default — pprof isn't exposed in
+  production-ish deployments unless the operator opts in.
+- Docker-compose loadgen service documents both new env vars.
+
+### Out of scope
+
+- Changes to the Collector, ConsumerSampler, Report, Preset, Seed, or
+  integration test — none are publish-hot-path.
+- `golang.org/x/time/rate.Limiter` — the worker-pool fix addresses the
+  real structural cause (ticker/publish coupling). If worker-pool
+  saturation becomes the new bottleneck, re-evaluate then.
+- `sync.Pool` allocation-reuse tuning — defer until pprof identifies GC
+  as the next-order concern.
+- Dedicated NATS connection for publishes vs. subscriptions — only
+  justified if pprof identifies the NATS write lock as the bottleneck
+  after the worker pool lands.
+- Default-rate bump — reasoned about separately.
+
+## Architecture
+
+Before:
+
+```text
+ticker goroutine: [wait tick] → publishOne (JSON + NATS write + metrics) → [wait tick] → …
+                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+                                one slow call here silently loses a tick
+```
+
+After:
+
+```text
+ticker goroutine: [wait tick] → reserve sem slot → spawn publish goroutine → [wait tick] → …
+
+publish goroutine: [publishOne] → release sem slot
+publish goroutine: [publishOne] → release sem slot
+publish goroutine: [publishOne] → release sem slot   (up to MAX_IN_FLIGHT concurrently)
+```
+
+The ticker goroutine's per-tick work shrinks to a semaphore send + goroutine
+spawn — tens of nanoseconds. It cannot overshoot the ticker interval at any
+realistic rate.
+
+## Components
+
+### `Generator.Run` (modified)
+
+- Read `g.cfg.MaxInFlight` from `GeneratorConfig`.
+- If `MaxInFlight <= 0`: run serially as today (preserves legacy behavior
+  and gives a bisection switch).
+- Else: create `sem := make(chan struct{}, MaxInFlight)` and
+  `var wg sync.WaitGroup`. On each tick, non-blocking `select`:
+  - Slot available: take it, `wg.Add(1)`, `go func() { defer wg.Done();
+    defer func() { <-sem }(); g.publishOne(ctx) }()`.
+  - No slot: increment
+    `loadgen_publish_errors_total{reason="saturated"}` and continue —
+    the tick is dropped but at least it's observable.
+- On `ctx.Done()`: stop the ticker, then `wg.Wait()` with a bounded grace
+  period (5 s). If the grace expires, log and return — in-flight
+  goroutines complete on their own after NATS drain in main.
+
+### `GeneratorConfig` (modified)
+
+Add one field:
+
+```go
+type GeneratorConfig struct {
+    … existing fields …
+    MaxInFlight int
+}
+```
+
+### `main.go` (modified)
+
+Add to `config`:
+
+```go
+type config struct {
+    … existing fields …
+    MaxInFlight int    `env:"MAX_IN_FLIGHT" envDefault:"200"`
+    PProfAddr   string `env:"PPROF_ADDR"    envDefault:""`
+}
+```
+
+Pass `cfg.MaxInFlight` into `GeneratorConfig` when constructing the generator.
+
+On startup, if `PProfAddr != ""`: register `net/http/pprof` handlers on a
+new `http.ServeMux` and start a separate `http.Server` listening on that
+addr. Log the resulting URL. The server doesn't share the metrics mux —
+pprof is genuinely separate, opt-in infrastructure, and keeping it off the
+metrics port avoids accidental exposure when the metrics mux is scraped
+by Prometheus.
+
+On `ctx.Done()`: gracefully shut down the pprof server with a 2 s timeout.
+
+### Metrics
+
+No new metrics. The existing `loadgen_publish_errors_total` counter with
+`reason="saturated"` is the single new label value for pool saturation.
+This keeps the Grafana dashboard's "Publish errors/sec by reason" panel
+working out of the box.
+
+## Error handling
+
+- `sem <- struct{}{}` is never blocking because we use non-blocking
+  `select` — if the pool is full, we record saturation and move on. No
+  unbounded goroutine growth under sustained overload.
+- Inside each publish goroutine, `publishOne` already handles its own
+  errors (counters for marshal/publish failures, `RecordPublishFailed`
+  on the Collector).
+- Graceful shutdown: the `Run` method returns only after in-flight
+  publishes drain or the bounded grace period elapses. The caller
+  (`main.go runRun`) already calls `collector.DiscardBefore` and
+  `collector.Finalize` after `Run` returns, so late-arriving publishes
+  correctly integrate with the summary.
+
+## Testing
+
+### New unit test
+
+`TestGenerator_MaxInFlightZeroRunsSerially` — with `MaxInFlight=0`, the
+generator's behavior is unchanged from today. Reuses the existing
+`TestGenerator_SendsExpectedCount` assertion style.
+
+### Adjusted unit test
+
+`TestGenerator_SendsExpectedCount` — still valid with `MaxInFlight > 0`,
+but the count may be closer to the theoretical target since the ticker
+is no longer blocked.
+
+### New unit test
+
+`TestGenerator_PoolSaturationCountedAsError` — artificially slow the
+publisher via an injected blocking `Publisher`. Run at a rate that
+exceeds the pool's capacity. Assert the `saturated` counter increments.
+
+### Integration test
+
+No change. The existing `tools/loadgen/integration_test.go` exercises
+`Generator.Run` with a fake gatekeeper + broadcast-worker and makes no
+assumptions about ticker coupling.
+
+### Coverage target
+
+`generator.go` to stay at ≥ 90% for `Run`, `publishOne`, `content` per
+the existing plan.
+
+## Dependencies
+
+No new third-party dependencies. All new code uses stdlib: `net/http`,
+`net/http/pprof`, `sync`.
+
+## Rollout
+
+- Both env vars have safe defaults (`MAX_IN_FLIGHT=200`, `PPROF_ADDR=""`).
+- Existing deployments pick up the worker pool automatically with
+  improved actual-rate fidelity at moderate throughput. Operators
+  concerned about the behavior change can set `MAX_IN_FLIGHT=0` to
+  get the legacy serial path.
+- pprof stays off unless explicitly enabled via `PPROF_ADDR`.
+- Internal-only to the loadgen service; no cross-service contract
+  change.
+
+## Future work (deferred)
+
+- Dedicated publish-side `*nats.Conn` — only if profiling identifies the
+  NATS connection write lock as the remaining bottleneck.
+- `sync.Pool` for `SendMessageRequest` / `MessageEvent` / byte buffers
+  to reduce per-publish GC pressure — only if GC shows up in a
+  profile.
+- Background UUID generation — only if `crypto/rand` shows up
+  prominently.
diff --git a/go.mod b/go.mod
@@ -7,13 +7,15 @@ require (
 	github.com/caarlos0/env/v11 v11.4.0
 	github.com/coreos/go-oidc/v3 v3.17.0
 	github.com/docker/docker v27.1.1+incompatible
+	github.com/elastic/go-elasticsearch/v8 v8.19.3
 	github.com/gin-gonic/gin v1.12.0
 	github.com/gocql/gocql v1.7.0
 	github.com/google/uuid v1.6.0
 	github.com/nats-io/jwt/v2 v2.8.1
 	github.com/nats-io/nats-server/v2 v2.12.6
 	github.com/nats-io/nats.go v1.50.0
 	github.com/nats-io/nkeys v0.4.15
+	github.com/prometheus/client_golang v1.23.2
 	github.com/redis/go-redis/v9 v9.18.0
 	github.com/stretchr/testify v1.11.1
 	github.com/testcontainers/testcontainers-go v0.34.0
@@ -52,7 +54,6 @@ require (
 	github.com/docker/go-connections v0.5.0 // indirect
 	github.com/docker/go-units v0.5.0 // indirect
 	github.com/elastic/elastic-transport-go/v8 v8.8.0 // indirect
-	github.com/elastic/go-elasticsearch/v8 v8.19.3 // indirect
 	github.com/felixge/httpsnoop v1.0.4 // indirect
 	github.com/gabriel-vasile/mimetype v1.4.13 // indirect
 	github.com/gin-contrib/sse v1.1.1 // indirect
@@ -94,7 +95,6 @@ require (
 	github.com/pkg/errors v0.9.1 // indirect
 	github.com/pmezard/go-difflib v1.0.0 // indirect
 	github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c // indirect
-	github.com/prometheus/client_golang v1.23.2 // indirect
 	github.com/prometheus/client_model v0.6.2 // indirect
 	github.com/prometheus/common v0.67.5 // indirect
 	github.com/prometheus/otlptranslator v1.0.0 // indirect

diff --git a/go.sum b/go.sum
@@ -8,8 +8,6 @@ github.com/Marz32onE/instrumentation-go/otel-nats v0.2.0 h1:J+S/NmcUf+dSXQMzNkNV
 github.com/Marz32onE/instrumentation-go/otel-nats v0.2.0/go.mod h1:xgj7JbYX3qHLZ8X7A6Hvc1yeE+t4L+KAgeo9h0JWJ1o=
 github.com/Microsoft/go-winio v0.6.2 h1:F2VQgta7ecxGYO8k3ZZz3RS8fVIXVxONVUPlNERoyfY=
 github.com/Microsoft/go-winio v0.6.2/go.mod h1:yd8OoFMLzJbo9gZq8j5qaps8bJ9aShtEA8Ipt1oGCvU=
-github.com/antithesishq/antithesis-sdk-go v0.4.3-default-no-op h1:+OSa/t11TFhqfrX0EOSqQBDJ0YlpmK0rDSiB19dg9M0=
-github.com/antithesishq/antithesis-sdk-go v0.4.3-default-no-op/go.mod h1:IUpT2DPAKh6i/YhSbt6Gl3v2yvUZjmKncl7U91fup7E=
 github.com/antithesishq/antithesis-sdk-go v0.6.0-default-no-op h1:kpBdlEPbRvff0mDD1gk7o9BhI16b9p5yYAXRlidpqJE=
 github.com/antithesishq/antithesis-sdk-go v0.6.0-default-no-op/go.mod h1:IUpT2DPAKh6i/YhSbt6Gl3v2yvUZjmKncl7U91fup7E=
 github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
@@ -110,8 +108,6 @@ github.com/google/go-cmp v0.5.9/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeN
 github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
 github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8=
 github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU=
-github.com/google/go-tpm v0.9.3 h1:+yx0/anQuGzi+ssRqeD6WpXjW2L/V0dItUayO0i9sRc=
-github.com/google/go-tpm v0.9.3/go.mod h1:h9jEsEECg7gtLis0upRBQU+GhYVH6jMjrFxI8u6bVUY=
 github.com/google/go-tpm v0.9.8 h1:slArAR9Ft+1ybZu0lBwpSmpwhRXaa85hWtMinMyRAWo=
 github.com/google/go-tpm v0.9.8/go.mod h1:h9jEsEECg7gtLis0upRBQU+GhYVH6jMjrFxI8u6bVUY=
 github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
@@ -146,8 +142,6 @@ github.com/magiconair/properties v1.8.7 h1:IeQXZAiQcpL9mgcAe1Nu6cX9LLw6ExEHKjN0V
 github.com/magiconair/properties v1.8.7/go.mod h1:Dhd985XPs7jluiymwWYZ0G4Z61jb3vdS329zhj2hYo0=
 github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
 github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
-github.com/minio/highwayhash v1.0.3 h1:kbnuUMoHYyVl7szWjSxJnxw11k2U709jqFPPmIUyD6Q=
-github.com/minio/highwayhash v1.0.3/go.mod h1:GGYsuwP/fPD6Y9hMiXuapVvlIUEhFhMTh0rxU3ik1LQ=
 github.com/minio/highwayhash v1.0.4-0.20251030100505-070ab1a87a76 h1:KGuD/pM2JpL9FAYvBrnBBeENKZNh6eNtjqytV6TYjnk=
 github.com/minio/highwayhash v1.0.4-0.20251030100505-070ab1a87a76/go.mod h1:GGYsuwP/fPD6Y9hMiXuapVvlIUEhFhMTh0rxU3ik1LQ=
 github.com/moby/docker-image-spec v1.3.1 h1:jMKff3w6PgbfSa69GfNg+zN/XLhfXJGnEx3Nl2EsFP0=
@@ -173,8 +167,6 @@ github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq
 github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ=
 github.com/nats-io/jwt/v2 v2.8.1 h1:V0xpGuD/N8Mi+fQNDynXohVvp7ZztevW5io8CUWlPmU=
 github.com/nats-io/jwt/v2 v2.8.1/go.mod h1:nWnOEEiVMiKHQpnAy4eXlizVEtSfzacZ1Q43LIRavZg=
-github.com/nats-io/nats-server/v2 v2.11.0 h1:fdwAT1d6DZW/4LUz5rkvQUe5leGEwjjOQYntzVRKvjE=
-github.com/nats-io/nats-server/v2 v2.11.0/go.mod h1:leXySghbdtXSUmWem8K9McnJ6xbJOb0t9+NQ5HTRZjI=
 github.com/nats-io/nats-server/v2 v2.12.6 h1:Egbx9Vl7Ch8wTtpXPGqbehkZ+IncKqShUxvrt1+Enc8=
 github.com/nats-io/nats-server/v2 v2.12.6/go.mod h1:4HPlrvtmSO3yd7KcElDNMx9kv5EBJBnJJzQPptXlheo=
 github.com/nats-io/nats.go v1.50.0 h1:5zAeQrTvyrKrWLJ0fu02W3br8ym57qf7csDzgLOpcds=
@@ -277,8 +269,6 @@ go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.43.0 h1:88Y4s2C8oTui1LGM6bT
 go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.43.0/go.mod h1:Vl1/iaggsuRlrHf/hfPJPvVag77kKyvrLeD10kpMl+A=
 go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.43.0 h1:RAE+JPfvEmvy+0LzyUA25/SGawPwIUbZ6u0Wug54sLc=
 go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.43.0/go.mod h1:AGmbycVGEsRx9mXMZ75CsOyhSP6MFIcj/6dnG+vhVjk=
-go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.19.0 h1:IeMeyr1aBvBiPVYihXIaeIZba6b8E1bYp7lbdxK8CQg=
-go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.19.0/go.mod h1:oVdCUtjq9MK9BlS7TtucsQwUcXcymNiEDjgDD2jMtZU=
 go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.43.0 h1:3iZJKlCZufyRzPzlQhUIWVmfltrXuGyfjREgGP3UUjc=
 go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.43.0/go.mod h1:/G+nUPfhq2e+qiXMGxMwumDrP5jtzU+mWN7/sjT2rak=
 go.opentelemetry.io/otel/exporters/prometheus v0.65.0 h1:jOveH/b4lU9HT7y+Gfamf18BqlOuz2PWEvs8yM7Q6XE=
@@ -356,8 +346,6 @@ golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
 golang.org/x/text v0.3.8/go.mod h1:E6s5w1FMmriuDzIBO73fBruAKo1PCIq6d2Q6DHfQ8WQ=
 golang.org/x/text v0.35.0 h1:JOVx6vVDFokkpaq1AEptVzLTpDe9KGpj5tR4/X+ybL8=
 golang.org/x/text v0.35.0/go.mod h1:khi/HExzZJ2pGnjenulevKNX1W67CUy0AsXcNubPGCA=
-golang.org/x/time v0.11.0 h1:/bpjEDfN9tkoN/ryeYHnv5hcMlc8ncjMcM4XBk5NWV0=
-golang.org/x/time v0.11.0/go.mod h1:CDIdPxbZBQxdj6cxyCIdrNogrJKMJ7pr37NYpMcMDSg=
 golang.org/x/time v0.15.0 h1:bbrp8t3bGUeFOx08pvsMYRTCVSMk89u4tKbNOZbp88U=
 golang.org/x/time v0.15.0/go.mod h1:Y4YMaQmXwGQZoFaVFk4YpCt4FLQMYKZe9oeV/f4MSno=
 golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=

diff --git a/pkg/subject/subject.go b/pkg/subject/subject.go
@@ -177,6 +177,18 @@ func RoomsGetWildcard() string {
 	return "chat.user.*.request.rooms.get.*"
 }
 
+func UserResponseWildcard() string {
+	return "chat.user.*.response.>"
+}
+
+func RoomEventWildcard() string {
+	return "chat.room.*.event"
+}
+
+func UserRoomEventWildcard() string {
+	return "chat.user.*.event.room"
+}
+
 // --- natsrouter patterns (use {param} placeholders for named extraction) ---
 
 func MsgHistoryPattern(siteID string) string {

diff --git a/tools/loadgen/README.md b/tools/loadgen/README.md
@@ -0,0 +1,59 @@
+# loadgen
+
+Capacity-baseline load generator for the single-site messaging pipeline
+(`message-gatekeeper` → `MESSAGES_CANONICAL` → `message-worker` +
+`broadcast-worker`). Single Go binary with three subcommands.
+
+## Quick start
+
+```
+make -C tools/loadgen/deploy up
+make -C tools/loadgen/deploy seed PRESET=medium
+make -C tools/loadgen/deploy run  PRESET=medium RATE=500 DURATION=60s
+```
+
+For live dashboards:
+
+```
+make -C tools/loadgen/deploy run-dashboards PRESET=medium
+# Grafana at http://localhost:3000 (anonymous admin)
+```
+
+Tear down:
+
+```
+make -C tools/loadgen/deploy down
+```
+
+## Presets
+
+| preset      | users  | rooms | notes                                                  |
+|-------------|--------|-------|--------------------------------------------------------|
+| `small`     | 10     | 5     | uniform, 200-byte content                              |
+| `medium`    | 1 000  | 100   | uniform, 200-byte content                              |
+| `large`     | 10 000 | 1 000 | uniform, 200-byte content                              |
+| `realistic` | 1 000  | 100   | Zipf senders, mixed room sizes, 50–2000 bytes, mentions|
+
+## Subcommands
+
+- `loadgen seed --preset=<name> [--seed=42]` — idempotently populate
+  MongoDB with deterministic fixtures.
+- `loadgen run --preset=<name> [flags]` — open-loop publish at `--rate`
+  msgs/sec for `--duration`, print a summary at the end. Flags:
+  `--seed`, `--warmup`, `--inject=frontdoor|canonical`, `--csv=<path>`.
+- `loadgen teardown` — drop the three seeded collections.
+
+## Reading the summary
+
+- `final_pending == 0` on both durables, zero errors → the pipeline is
+  sustaining your target rate.
+- `final_pending` climbing, or error counts > 0 → over capacity or a
+  regression upstream of the worker.
+
+## Non-goals
+
+- Not a CI regression gate. Invoked manually.
+- Not an auth benchmark. Uses shared `backend.creds`.
+- Not a cross-site benchmark. Single-site only.
+- Not an absolute-number tool. Numbers vary by host — compare within one
+  machine across changes, don't compare across machines.