loadgen

Capacity-baseline load generator for the single-site messaging pipeline (message-gatekeeper → MESSAGES_CANONICAL → message-worker + broadcast-worker). Single Go binary with three subcommands.

Quick start

make -C tools/loadgen/deploy up
make -C tools/loadgen/deploy seed PRESET=medium
make -C tools/loadgen/deploy run  PRESET=medium RATE=500 DURATION=60s

make up brings up the shared docker-local stack (NATS, MongoDB, Cassandra, Valkey, Elasticsearch, every microservice) and then the load-test-only overlay (loadgen, Prometheus, Grafana). The overlay joins the chat-local network so it can reach the same services any developer sees with make up at the repo root.

For live dashboards:

make -C tools/loadgen/deploy run-dashboards PRESET=medium
# Grafana at http://localhost:3000 (anonymous admin)

Tear down:

make -C tools/loadgen/deploy teardown PRESET=medium  # drop Mongo fixtures
make -C tools/loadgen/deploy down                     # stop containers

Encryption

broadcast-worker runs with ENCRYPTION_ENABLED=true by default in this stack. loadgen seed provisions one AES-256-GCM key per fixture room into the room's document in the MongoDB rooms collection (the same place broadcast-worker reads from), derived from the RNG seed so runs stay reproducible. To run an apples-to-apples plaintext comparison:

ENCRYPTION_ENABLED=false make -C tools/loadgen/deploy up

Loadgen's end-to-end broadcast correlation reads RoomEvent.LastMsgID, which sits in the cleartext envelope regardless of encryption mode, so the run binary itself never touches ciphertext.

Presets

preset	users	rooms	notes
`small`	10	5	uniform, 200-byte content
`medium`	1 000	100	uniform, 200-byte content
`large`	10 000	1 000	uniform, 200-byte content
`realistic`	1 000	100	Zipf senders, mixed room sizes, 50–2000 bytes, mentions

Subcommands

loadgen seed --preset=<name> [--seed=42] — idempotently populate MongoDB with fixtures, including a per-room key in each room document.
loadgen run --preset=<name> [flags] — open-loop publish at --rate msgs/sec for --duration, print a summary at the end. Flags: --seed, --warmup, --inject=frontdoor|canonical, --csv=<path>.
loadgen teardown --preset=<name> [--seed=42] — drop the seeded Mongo collections (the per-room keys go with the room documents).

Reading the summary

final_pending == 0 on both durables, zero errors → the pipeline is sustaining your target rate.
final_pending climbing, or error counts > 0 → over capacity or a regression upstream of the worker.

Non-goals

Not a CI regression gate. Invoked manually.
Not an auth benchmark. Uses shared backend.creds.
Not a cross-site benchmark. Single-site only.
Not an absolute-number tool. Numbers vary by host — compare within one machine across changes, don't compare across machines.

Members workload (add-member benchmark)

Benchmarks the add-member pipeline: room-service.handleAddMembers → chat.room.canonical.{siteID}.member.add (ROOMS stream) → room-worker → chat.room.{roomID}.event.member broadcast.

Quick start

make -C tools/loadgen/deploy up
make -C tools/loadgen/deploy seed-members PRESET=members-medium
make -C tools/loadgen/deploy run-sustained PRESET=members-medium RATE=100 DURATION=60s

For capacity-mode growth curves:

make -C tools/loadgen/deploy seed-members PRESET=members-capacity
make -C tools/loadgen/deploy run-capacity  PRESET=members-capacity TARGET_SIZE=500

Between sustained runs, reset state so candidate pools refill:

make -C tools/loadgen/deploy reset-members PRESET=members-medium

Presets

preset	rooms	baseline	candidate pool	use case
`members-small`	5	10	50	smoke / dev
`members-medium`	100	100	900	sustained-throughput default
`members-heavy`	700	10	990	high-rate sustained (≈1000 req/s)
`members-capacity`	5	1	990	capacity-growth, fills up to ~MAX_ROOM_SIZE

A candidate is single-use — once added it's a room member and can't be re-added, and baseline + candidate pool is capped at MAX_ROOM_SIZE (1000). So a sustained run can make at most rooms × ⌊candidate pool ÷ users-per-add⌋ add-member publishes total. members-medium (100 × ⌊900÷10⌋ = 9000 ops) sustains the default RATE=100 DURATION=60s (6000 ops) with margin; members-small is a smoke preset and cannot sustain that load.

For higher rates, add rooms rather than pool (pool is capped per room). To sustain 1000 req/s for 60s (60,000 ops) at the default users-per-add=10, use members-heavy (700 × ⌊990÷10⌋ = 69,300 ops, ≈69s of headroom):

make -C tools/loadgen/deploy seed-members  PRESET=members-heavy
make -C tools/loadgen/deploy run-sustained PRESET=members-heavy RATE=1000 DURATION=60s

If instead each request need only add one member, members-medium at USERS_PER_ADD=1 already supplies 90,000 ops — no heavy preset required.

Subcommands

loadgen seed --workload=members --preset=<name> — populate Mongo for the members workload (including per-room keys in the room documents).
loadgen teardown --workload=members --preset=<name> — drop the seeded data.
loadgen members-sustained --preset=<name> [flags] — open-loop publish at --rate req/sec for --duration. Flags: --users-per-add (default 10), --inject=frontdoor|canonical (default frontdoor), --shape=users (v1; orgs/channels/mixed reserved for v2), --warmup, --csv.
loadgen members-capacity --preset=<name> --target-size=N [flags] — per-room sequential growth until rooms reach --target-size. Flags: --users-per-add, --inject, --shape, --max-rate (per-room rate cap, default 0 = sequential pacing only), --e2-timeout, --csv.

v1 scope

Only --shape=users is implemented. The flag accepts orgs, channels, mixed for forward compat but rejects them at parse time. See docs/superpowers/specs/2026-05-19-load-test-room-members-design.md for the rationale and the v2 plan.

Reading the summary

Sustained mode: final_pending == 0 on room-worker + zero errors → pipeline is sustaining the target rate. Climbing final_pending or non-zero errors → over capacity. If rate × duration would exceed the preset's pool budget (see the preset table above), the command now refuses to start and prints the achievable max --rate/--duration for the preset — lower one of them or pick a bigger preset. (The old behaviour ran for ~50s and then logged aborted early — pools exhausted.)
Capacity mode: the size-bucket table shows latency at four size ranges; the final sizes block confirms each room hit --target-size. A row with count > 0 whose e2_p99 is much larger than smaller-size buckets indicates a per-room-size degradation. Like sustained mode, capacity mode refuses to start if --target-size is unreachable from the preset's per-room pool (baseline + ⌊pool ÷ users-per-add⌋ × users-per-add); it prints the reachable ceiling — lower --target-size or pick a larger preset.

Room-read workload (mark-as-read benchmark)

Finds the maximum sustainable RPS for marking a room as read (room-service.handleMessageRead, the message.read request/reply RPC). The workload reuses the messages presets but seeds read-state so the room read-floor recompute path stays exercised: every room's lastMsgAt is stamped ahead of the run window and members' lastSeenAt are spread behind it, so each read is "a user opening a room with unread content" — the floor scan fires on every request and the floor write fires at a rate set by room size and the read distribution.

Quick start

make -C tools/loadgen/deploy up
make -C tools/loadgen/deploy seed-roomread PRESET=medium
make -C tools/loadgen/deploy run-max-rps WORKLOAD=room-read PRESET=medium

Override the ramp with STEPS (default 200,500,1000,2000,5000):

make -C tools/loadgen/deploy run-max-rps WORKLOAD=room-read PRESET=medium STEPS=500,1k,2k,5k

Tear down the fixtures:

make -C tools/loadgen/deploy teardown-roomread PRESET=medium

Notes

Synchronous request/reply: gated on p95/p99 latency and error rate only (no consumer-pending signal). Defaults: --slo-p95=100ms, --slo-p99=250ms, --slo-error-rate=0.001; override via the shared max-rps flags.
Single-site only: all seeded users are local, so no cross-site inbox event is published on the read path.
Presets are the messages presets (small/medium/large/realistic); room size distribution drives floor-write contention.

History workload (LoadHistory / GetThreadMessages benchmark)

Benchmarks the synchronous read path: history-service.LoadHistory (Cassandra bucket walk on messages_by_room) and history-service.GetThreadMessages (single-partition slice on thread_messages_by_thread).

Quick start

make -C tools/loadgen/deploy up
loadgen seed --workload=history --preset=history-medium
loadgen history-sustained --preset=history-medium --rate=200 --duration=60s

The history workload requires CASSANDRA_HOSTS (e.g. cassandra:9042) in addition to the standard Mongo/NATS env. MESSAGE_BUCKET_HOURS (default 72) must match what history-service is configured with so seed-time and read-time bucket math agree.

Presets

preset	rooms	msgs/room	span	thread rate	use case
`history-small`	5	100	1 day	0	smoke / dev
`history-medium`	100	5 000	7 days	5%	sustained-throughput
`history-large`	1 000	50 000	30 days	10%	partition fan-out

Top-level messages are placed uniformly across the span with ±50% jitter on the gap so they don't align to bucket boundaries. Thread replies land 1–10 min after their parent and share a bucket with it. Rooms are picked via rand.Zipf(s=1.1, v=1.0) over the room list — a few hot rooms absorb most reads.

Subcommands

loadgen seed --workload=history --preset=<name> — populate Mongo (users/rooms/subscriptions/thread_rooms, plus per-room keys in the room documents — harmless for read workload), and Cassandra (messages_by_room, messages_by_id, thread_messages_by_room).
loadgen teardown --workload=history --preset=<name> — drop the seeded data.
loadgen history-sustained --preset=<name> [flags] — open-loop request at --rate req/sec for --duration. Flags: --mix=history:80,thread:20 (endpoint weighting), --before-mode=open:70,scrollback:30 (cursor strategy), --scrollback-pages=5 (pages per chain before reset), --page-limit=20, --request-timeout=5s, --warmup, --csv.

Reading the summary

Per-endpoint p50/p95/p99 + payload sizes split LoadHistory vs GetThreadMessages so a slow thread path doesn't get hidden by faster history reads. The bucket-walk depth block reports how many LoadHistory replies stayed within a single Cassandra bucket vs spanned multiple — climbing multi-bucket counts under --before-mode=scrollback indicate the walker is paying coordinator round-trips per page.
Errors broken out by class (timeout, reply, bad); the no-thread-parents counter is informational (thread requests that landed on a room with no seeded parents and fell back to history).

max-rps — auto-find Max RPS under SLO

Automatically finds the maximum RPS each workload can sustain while all SLO signals hold. The subcommand ramps the target rate through an ordered list of steps, holds at each step for a measurement window, evaluates SLO signals, and reports the largest step at which every signal passed.

loadgen max-rps --workload=messages|history|read-receipt --preset=<name> [flags]

Quick start

# messages: ramp 500..10k rps, stop at first SLO breach
loadgen max-rps --workload=messages --preset=medium --steps=500,1k,2k,5k,10k

# history: per-endpoint SLO, custom p95
loadgen max-rps --workload=history --preset=history-medium --steps=200,500,1k,2k --slo-p95=80ms

# read-receipt: seed reader state first, then ramp
loadgen seed --workload=read-receipt --preset=history-medium --read-ratio=0.7
loadgen max-rps --workload=read-receipt --preset=history-medium --steps=200,500,1k,2k

Via the deploy Makefile:

make -C tools/loadgen/deploy run-max-rps PRESET=medium
make -C tools/loadgen/deploy run-max-rps WORKLOAD=history PRESET=history-medium STEPS=200,500,1k,2k

Flags

Flag	Default	Notes
`--workload`	`messages`	`messages`, `history`, or `read-receipt`
`--preset`	(required)	an existing preset for the chosen workload (`read-receipt` reuses the history presets)
`--steps`	messages `500,1k,2k,5k,10k` / history+read-receipt `200,500,1k,2k,5k`	explicit ordered RPS list; `k` suffix = ×1000
`--request-timeout`	`5s`	history / read-receipt: per-request reply timeout
`--warmup`	`10s`	per-step warmup (samples discarded)
`--hold`	`30s`	per-step measurement window
`--cooldown`	`5s`	per-step settle gap before next step
`--slo-p95`	`100ms`	applied to every gated latency series
`--slo-p99`	`250ms`	applied to every gated latency series
`--slo-error-rate`	`0.001`	`failed / attempted` (0.1%)
`--slo-pending-growth`	`1000`	messages only: per-durable end−start `NumPending` delta
`--rate-tolerance`	`0.05`	achieved-vs-target shortfall band for the INCONCLUSIVE guard
`--stop-on-trip`	`true`	stop the ramp at the first TRIP (does not stop on INCONCLUSIVE)
`--seed`	`42`	RNG seed (parity with existing subcommands)
`--csv`	`""`	optional CSV output path

Reading the output

At the end of the run the tool prints a per-step table and a final verdict line:

ANSWER: max RPS = 2000 (workload=messages, preset=medium)
        Next limit: E2 p95=143ms > 100ms

This is the largest step at which all SLO signals passed; the Next limit: line names why the first failing step tripped. If no step passed, the output is ANSWER: no step passed (workload=…, preset=…).

INCONCLUSIVE rows appear when the achieved throughput fell more than --rate-tolerance below the target while the SLO signals still looked healthy — i.e. the load generator itself, not the service under test, was the limiting factor, so the step's result can't be trusted. An INCONCLUSIVE step does not count as a pass and does not stop the ramp, even with --stop-on-trip; only a hard TRIP stops the ramp.

The reasons: line names which load-box limit dominated so you know which knob to turn — the two are distinct columns (saturation, emit_underrun) in the CSV:

emit underrun — the generator could not even release the load on schedule (its dispatch loop fell behind the target cadence). The load box is CPU/scheduler starved: give it more CPU, lower the per-box rate, or shard the load across more generator processes.
saturation — the load was released on schedule but the in-flight pool was full when an event came due. The pool is too small for the rate×latency product: raise MAX_IN_FLIGHT (and/or reduce backend latency).

Rate pacing. The generator paces an open-loop arrival rate with a batched emitter: it ticks on a coarse, reliably-schedulable interval and releases rate × interval events per tick. This replaces the old one-event-per-tick ticker, whose sub-millisecond intervals the Go runtime can't honor (it silently coalesces ticks), which capped achievable RPS at a few thousand regardless of --steps. Setting MAX_IN_FLIGHT=0 selects the legacy serial-on-ticker path for bisection only — it will not ramp.

Read-receipt workload (`--workload=read-receipt`)

Drives the room-service read-receipt RPC (chat.user.{account}.request.room.{roomID}.{siteID}.message.read-receipt) — a synchronous request/reply read ("who has read message X") — to find the maximum sustainable RPS under the latency/error SLOs. Like history, it is a read with no JetStream consumer, so --slo-pending-growth is ignored and the per-request timeout is set with --request-timeout.

Read receipts reuse the history presets and seed: the requester for each target is the message's sender (the RPC requires msgSender == requesterAccount), and only top-level messages are used as targets. Reader state must be seeded so the ListReadReceipts Mongo query exercises its real $match/$lookup/$unwind path instead of short-circuiting on an empty lastSeenAt match.

Seed (stamps lastSeenAt on a --read-ratio fraction — default 0.7 — of each room's subscribers; requires CASSANDRA_HOSTS like the history seed):

loadgen seed --workload=read-receipt --preset=history-medium --read-ratio=0.7

Then ramp:

loadgen max-rps --workload=read-receipt --preset=history-medium --steps=200,500,1k,2k,5k

The gated latency series is named read-receipt; the verdict, INCONCLUSIVE guard, and CSV output behave exactly as for the other workloads.

To tear down, use the history teardown — read-receipt seeds the identical history fixtures, so loadgen teardown --workload=history --preset=<name> drops everything (dropping subscriptions removes the stamped lastSeenAt too):

loadgen teardown --workload=history --preset=history-medium

Bottleneck attribution

When a max-rps --workload=messages ramp trips, loadgen appends a BOTTLENECK: block naming the culprit component, the saturated resource, and a confidence:

ANSWER: max RPS = 2000 (workload=messages, preset=medium)
        Next limit: E2 p95=143ms > 100ms
BOTTLENECK: message-worker (Cassandra-bound)
        message-worker consumer backlog grew (first stage to back up)
        cassandra CPU plateaued between 1000 and 2000 rps while load rose
        confidence: high

It fuses loadgen's per-stage signals (E1/E2 latency, per-durable backlog) with cAdvisor container CPU trends from Prometheus. make run-max-rps starts cAdvisor + Prometheus for you (no need to run make run-dashboards first). Tunables (env, BOTTLENECK_ prefix):

Var	Default	Notes
`BOTTLENECK_ENABLED`	`true`	Set `false` to disable; run behaves as before.
`BOTTLENECK_PROM_URL`	(set in compose)	Prometheus that scrapes cAdvisor. Empty = disabled.
`BOTTLENECK_KNEE_TOLERANCE`	`0.10`	Max relative CPU rise still counted as a plateau.
`BOTTLENECK_QUERY_STEP`	`5s`	PromQL step; match the scrape interval.
`BOTTLENECK_CONTAINER_MAP`	(empty)	`shortid:name,…` fallback when cAdvisor omits the compose-service label.

The verdict is best-effort: if Prometheus is unreachable or the data is too thin (e.g. the breach was on the first step), the line reads BOTTLENECK: undetermined (<reason>) and the run still reports normally.

Daily-IM scenario (find N) — Operator Guide

Simulates N users using the chat system as their primary IM throughout a workday, ramps N geometrically through a configured step list, holds steady at each step while watching SLO signals, and reports the largest N at which everything held. The output answers:

How many concurrent daily-IM users can a single-site deployment sustain before a real signal breaks, and what breaks first?

Single-site only. Not a CI gate — invoked manually for capacity work.

Quick start
Prerequisites
Presets
CLI flags
Environment variables
SLO signals and verdicts
Reading the output
Troubleshooting
Known limitations
Design references

Quick start

# 1. Bring up the docker-local stack (NATS, Mongo, Valkey, Cassandra, all services).
make -C tools/loadgen/deploy up

# 2. Seed Mongo with users/rooms/subscriptions (room keys live in the room docs) for your preset.
#    Must be re-run when you change preset (the fixture IDs differ per preset).
make -C tools/loadgen/deploy seed PRESET=daily-heavy

# 3. Ramp.
make -C tools/loadgen/deploy run-daily PRESET=daily-heavy

Prerequisites

Before loadgen daily will produce a meaningful verdict, you need:

Requirement	Why	How to get it
Docker-local stack running	Daily talks to message-gatekeeper, room-service, broadcast-worker, etc.	`make -C tools/loadgen/deploy up`
Mongo `users`/`rooms`/`subscriptions` seeded for the preset	Gatekeeper rejects every send with "user not subscribed" otherwise	`loadgen seed --workload=messages --preset=<your daily preset>`
Per-room AES-256-GCM keys (in the room documents)	broadcast-worker decrypts with these when `ENCRYPTION_ENABLED=true` (default)	Written by the same `loadgen seed` step
JetStream streams (`MESSAGES`, `MESSAGES_CANONICAL`, `ROOMS`, `INBOX`)	The whole pipeline	Auto-created by services at startup when `BOOTSTRAP_STREAMS=true` (docker-local default)
Cassandra tables	message-worker writes here; history-service reads here	Created by `docker-local/cassandra/init/*.cql` at first stack boot
`NATS_CREDS_FILE` pointing at credentials with `pub/sub` on `chat.>`	Loadgen otherwise dials anonymously and gets permission violations	docker-local writes `backend.creds` with full perms via `docker-local/setup.sh`

A preflight runs at runDaily startup: it opens a short Mongo connection, counts subscriptions for cfg.SiteID, and bails with an actionable error if zero. So forgetting step 2 fails fast in seconds rather than burning the whole ramp.

Presets

All three daily presets seed 10000 users. They differ in the rooms-per-user distribution (the "what a typical IM user's room list looks like" shape).

preset	DMs	small (5–20)	medium (50–200)	large (500–2000)	rooms/user	use case
daily-light	15	10	5	2	~32	light daily-IM user
daily-heavy	25	20	8	3	~56	heavy daily-IM user (default)
daily-power	40	30	10	3	~83	power user (eng / manager)

Room sizes within each band are drawn via Zipf-like sampling so the long tail is realistic. Subscriptions are generated via stub-pairing for the DM band and a slot-bag picker for the others — both O(N × perUser), so fixture build at N=10000 finishes in ~1s.

CLI flags

loadgen daily -h prints the same:

Flag	Default	Notes
`--preset`	`daily-heavy`	`daily-light` \| `daily-heavy` \| `daily-power`
`--steps`	`1000,2000,5000,10000,20000,50000,100000`	Comma-separated N values per ramp step. `k` suffix = ×1000. Max cannot exceed the preset's `Users` (10000); excess is capped and the step INCONCLUSIVEs with `only X/Y users activated`.
`--warmup`	`60s`	Per-step warm-up before SLO measurement begins. Latency samples from this window are discarded by `Collector.Reset` at the start of hold.
`--hold`	`180s`	Steady-state window where SLO signals are evaluated.
`--cooldown`	`30s`	Drain time between steps to let consumers catch up.
`--stop-on-trip`	`true`	Stop the ramp on the first TRIP. Set `false` to keep ramping past the first failure (useful for understanding the slope of degradation).
`--max-direct-users`	`20000`	Cap on the direct-pool size (one `nats.Conn` per user). Above this, additional users are placed in the multiplex pool.
`--multiplex-pool-size`	`200`	Number of shared `nats.Conn` instances in the multiplex pool. Set `0` to disable multiplex (any user past `--max-direct-users` is then silently skipped).
`--max-conns-per-process`	`25000`	Safety ceiling on the total nats.Conn count to this process. Combined `direct + multiplex` must not exceed this.
`--csv`	`""`	Optional CSV output path (one row per step).

Example:

loadgen daily \
  --preset=daily-heavy \
  --steps=1k,2k,5k,10k \
  --warmup=15s --hold=45s --cooldown=10s \
  --max-direct-users=2000 --multiplex-pool-size=200 \
  --csv=results.csv

Environment variables

Read by the base loadgen config struct (env vars, not flags):

Var	Default	Notes
`NATS_URL`	(required)	`nats://...`
`NATS_CREDS_FILE`	`""`	Path to NATS creds (mandatory against operator-mode NATS — otherwise loadgen dials anonymous and gets "permissions violation").
`NATS_MONITORING_URL`	`http://nats:8222/jsz`	Where the JetStream-pending poller queries. Override to `http://127.0.0.1:8222/jsz` if you're running loadgen on the host instead of inside the compose network.
`MONGO_URI`, `MONGO_DB`, `MONGO_USERNAME`, `MONGO_PASSWORD`	(uri required; db default `chat`)	Used by the seed step (including per-room keys, now stored in the room documents) and the daily preflight.
`SITE_ID`	`site-local`	Must match the gatekeeper's configured site or every send is rejected with `siteID mismatch`. Also used as the partition key for seeded fixtures.

SLO signals and verdicts

A step's verdict is one of PASS, TRIP, or INCONCLUSIVE.

TRIP if any of:

p95_latency_ms > 500 — publish→broadcast latency, measured by correlating RoomEvent.LastMsgID with RecordPublish timestamps
p99_latency_ms > 1000 — same source
error_rate > 0.001 (0.1%) — failed publishes, request timeouts, gatekeeper 4xx/5xx; counted by the action emitter
any JetStream consumer's num_pending grew by more than 1000 over the hold — polled via /jsz?consumers=true at hold start and end. The notification-worker durable is exempt: push-notification delivery delay is tolerated by design, so its backlog never fails the run (still shown in worst-pending-delta for observability)
any service's slog_errors_total counter increased over the hold — currently a no-op since backend services don't expose /metrics HTTP endpoints; see known limitations
any durable that existed at hold-start was missing at hold-end (consumer crashed or was deleted) — applies to notification-worker too, since a vanished consumer is an availability failure, not a tolerated delay

INCONCLUSIVE (overrides PASS/TRIP — means "verdict signals can't be trusted") when:

Loadgen GC pause p99 > 50ms — the load box is under pressure, latency measurements may reflect loadgen-side GC rather than the system under test
AttemptedOps == 0 — publisher conn failed at startup, or no users were activated, or hold window was zero; a PASS here would be a silent lie
EffectiveN < 95% of N — fewer than 95% of the nominal N users actually came online (pool caps too low, or --steps exceeded preset.Users)
pollPending poll failed at start or end of hold even after retries — only when caused by ctx cancel; transient flakes are tolerated by dropping the pending-growth signal for that step alone
ctx.Done() fires during warmup or hold — the run was interrupted

PASS otherwise.

The final ANSWER is the largest N where the verdict is PASS. If a step TRIPped before any PASS, the answer is no step passed. INCONCLUSIVE steps don't count as PASS and don't stop the ramp.

Reading the output

Console table at end of run:

N        p50    p95    p99    err%    worst-pending-delta             verdict
1000     12     45     89     0.00%   broadcast-worker +12             PASS
2000     14     58     112    0.00%   broadcast-worker +34             PASS
5000     22     94     180    0.01%   broadcast-worker +180            PASS
10000    38     210    430    0.02%   broadcast-worker +890            PASS
20000(10000) 71  480  980    0.04%   broadcast-worker +1240           INCONCLUSIVE
    reasons: inconclusive: only 10000/20000 users activated (pool caps too low)

ANSWER: N = 10000 (last passing step)
        Next limit: broadcast-worker pending +1240 > +1000

The N column shows N(EffectiveN) when they differ — at N=20000 above only 10000 users came online (preset cap), so the step is marked INCONCLUSIVE rather than overstating capacity. The reasons: line below a TRIP/INCONCLUSIVE row says which signal fired.

CSV columns (--csv=results.csv):

n,effective_n,started_at,p50_ms,p95_ms,p99_ms,error_rate,attempted_ops,failed_ops,
worst_durable,worst_pending_delta,tripped,inconclusive,tripped_reasons

One row per step, sorted ascending by N. Use this for post-hoc plotting or regression comparison across runs.

Troubleshooting

Symptom → fix matrix for the failure modes that actually happen in real runs:

Symptom	Cause	Fix
Preflight errors with `no subscriptions found in mongo for siteID=...`	Mongo isn't seeded for the preset you're running, or `SITE_ID` differs between seed time and run time.	Run `loadgen seed --workload=messages --preset=<your preset>`. If `SITE_ID` changed, also re-seed (it's a per-site fixture).
Gatekeeper logs `user X is not subscribed to room Y` for every send	Preset mismatch between seed and run (fixture IDs differ per preset).	Teardown old preset + seed the new one: `loadgen teardown --workload=messages --preset=<old>` then seed the new one.
Gatekeeper logs `siteID mismatch: got X, want Y`	`SITE_ID` env differs between loadgen and gatekeeper.	Set both to the same value. Default is `site-local`.
Gatekeeper logs `posting is restricted to owners and admins`	Daily-band rooms have `UserCount` in [500, 2000]; gatekeeper rejects non-thread sends from member-role users when `UserCount > LargeRoomThreshold` (default 500). Documented known limitation.	Either raise `LARGE_ROOM_THRESHOLD` on the gatekeeper (operator-side, no re-seed), or wait for the planned admin-role fixture fix (loadgen-side, needs re-seed).
`nats: message does not have a reply` in room-service	Loadgen action handler used `Publish` instead of `Request` for a subject room-service responds on.	Use the latest loadgen — `markRead` was fixed in commit `0bde680` to use `Request`.
NATS `permissions violation` on subscribe	Loadgen's `NATS_CREDS_FILE` lacks subscribe rights on `chat.room.>` / `chat.user.>`.	Local dev: `./docker-local/setup.sh` regenerates `backend.creds` with full perms. Production-shaped: extend the chatapp account's `backend` user perms (`nsc edit user --account chatapp --name backend --allow-sub 'chat.room.>' --allow-sub 'chat.user.>'`).
All latency columns are 0 even though publishes succeed	No receivers configured (`--max-direct-users=0 --multiplex-pool-size=0`), or the broadcast subscriptions didn't survive the server registration race, or `RoomEvent.LastMsgID` isn't matching.	Set at least one of `--max-direct-users` or `--multiplex-pool-size` > 0. If still empty, check for `broadcast decode failed` warnings in the loadgen log — model drift between loadgen and broadcast-worker can break unmarshaling.
Step says `INCONCLUSIVE: only 10000/20000 users activated (pool caps too low)`	`max(--steps)` exceeded `preset.Users` (10000).	Trim `--steps` so its max is ≤ 10000, or change `preset.Users` in `preset.go` for that preset (and re-seed).
Loadgen process sits at 100% CPU for many minutes after startup, no output	Fixture build for very large `preset.Users`. Look for `INFO building fixtures preset=X users=Y` followed by `INFO fixtures built ... elapsed=Zs`.	At the default `preset.Users=10000` this is ~1s. If you've bumped it much higher, expect proportional time.
`start-of-hold pending poll failed` logged but the run continues	NATS `/jsz` endpoint is flaky. The step proceeds without the pending-growth signal; the other four signals still produce a verdict.	If persistent, set `NATS_MONITORING_URL` to a stable URL.

Known limitations

These are documented intentional shortcomings, not bugs to fix in a normal run:

Large-band rooms are gatekeeper-blocked. Daily fixtures have ~3 large rooms per user with UserCount in [500, 2000]; the gatekeeper rejects non-thread sends from member-role users to these. Roughly 3/56 = 5% of sendMessage calls land on a large room and fail. Workarounds: raise LARGE_ROOM_THRESHOLD (operator side) or change fixtures to seed users as RoleAdmin in large rooms (loadgen side, requires re-seed).
Auth-service JWT minting is a no-op stub. mintJWT exists in prodEnvFactory.Build but doesn't call auth-service. All loadgen connections use the shared backend.creds. To exercise per-user auth, implement mintJWT and have directPool.Add open the user's conn with the minted JWT.
Service-error signal is dormant. The verdict's service_errors > 0 → trip arm is wired but the URL map is empty because backend services don't expose /metrics. To enable: add a Prometheus endpoint per service and populate svcURLs in prodEnvFactory.Build.
CPU% in self-metrics is disabled. The earlier goroutine-count-as-CPU proxy made the tool unusable at scale (every step INCONCLUSIVE above ~4000 users). Real CPU measurement (gopsutil) is a follow-up. The GC pause p99 signal still fires the loadgen-saturation INCONCLUSIVE branch.
Reconnect / presence storms are out of scope. That's a separate scenario PR.
Cross-site federation (INBOX) is out of scope. Single-site only.
Not a CI gate. Invoked manually for capacity work; the deploy harness produces a CSV the operator interprets.

Design references

docs/superpowers/specs/2026-05-27-daily-im-load-scenario-design.md — full spec (goal, scope, behavior model, fixture topology, receiver architecture, ramp protocol, SLO definitions, risks).
docs/superpowers/plans/2026-05-27-daily-im-load-scenario.md — implementation plan (file structure, task decomposition).
tools/loadgen/daily.go, daily_pool.go, daily_actions.go, daily_verdict.go, daily_report.go, preset.go — implementation.

Large-room bot scenario (max-room-size)

Finds the largest room a bot can blast at a fixed send rate before an SLO signal breaks — gating on notification-worker consumer backlog as the headline O(N)-per-message signal (notification-worker is NOT exempt here, unlike the daily scenario).

Quick start

make -C tools/loadgen/deploy up
make -C tools/loadgen/deploy seed-botroom PRESET=botroom-medium
make -C tools/loadgen/deploy run-max-room-size PRESET=botroom-medium RATE=200

Presets

preset	sizes	rooms/size	users	use case
`botroom-small`	50, 100, 200	4	300	smoke / dev
`botroom-medium`	100, 500, 1000, 2000, 5000	4	5500	default capacity

Flags

--rate (required, bot msgs/sec split across the step's rooms), --sizes (default 100,500,1000,2000,5000), --rooms-per-size (default 4), --reads (room-service read rate, default 0 = off), --warmup/--hold/--cooldown, --stop-on-trip, --slo-p95/--slo-p99/--slo-error-rate/--slo-pending-growth, --rate-tolerance, --seed, --csv.

Reading the output

A per-step table (size, rooms, rate, e2 p50/p95/p99, err%, worst-pending, verdict) followed by ANSWER: max room size = N — the largest size where every SLO signal held — and a Next limit: line naming the first signal that tripped.

One room vs many

--rooms-per-size=1 concentrates the rate on a single room — probes the Cassandra hot-partition (messages_by_room key (room_id, bucket)) and the Mongo room-doc write contention (UpdateRoomLastMessage). The default 4 spreads the rate to measure aggregate fan-out plus member-list cache churn.

Add-path past 1000 members

To test adding members to rooms larger than the old 1000 cap, the loadgen deploy sets room-service MAX_ROOM_SIZE=6000 and ships a members-capacity-xl preset; run e.g. make -C tools/loadgen/deploy run-capacity PRESET=members-capacity-xl TARGET_SIZE=5000.

v2 follow-ups (not yet built)

Create-and-blast: bots create a ~100-member room and immediately send (cold-cache penalty).
Live N-connection pool to measure NATS core delivery fan-out to real member connections.

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

loadgen

Quick start

Encryption

Presets

Subcommands

Reading the summary

Non-goals

Members workload (add-member benchmark)

Quick start

Presets

Subcommands

v1 scope

Reading the summary

Room-read workload (mark-as-read benchmark)

Quick start

Notes

History workload (LoadHistory / GetThreadMessages benchmark)

Quick start

Presets

Subcommands

Reading the summary

max-rps — auto-find Max RPS under SLO

Quick start

Flags

Reading the output

Read-receipt workload (--workload=read-receipt)

Bottleneck attribution

Daily-IM scenario (find N) — Operator Guide

Table of contents

Quick start

Prerequisites

Presets

CLI flags

Environment variables

SLO signals and verdicts

Reading the output

Troubleshooting

Known limitations

Design references

Large-room bot scenario (max-room-size)

Quick start

Presets

Flags

Reading the output

One room vs many

Add-path past 1000 members

v2 follow-ups (not yet built)

Read-receipt workload (`--workload=read-receipt`)