bench-rollup: shared harness + comparison adapters + interactive charts by elijahr · Pull Request #19 · elijahr/lockfreequeues

elijahr · 2026-05-01T13:54:12Z

Consolidated omnibus PR for the bench-rollup feature stack. Originally landed
as 7 stacked PRs (Track 0 through Track 6); fast-forwarded into this branch
and the descendants closed.

81 commits / 70 files changed / +11,296 / -1,493.

What ships

Track 0 — Shared bench harness + native BMF emission

benchmarks/nim/bench_common.nim: Topology enum, BMFEmitter
(alpha-sorted Bencher Metric Format JSON), Histogram (min-heap top-K +
Algorithm R reservoir), runThroughputHarness / runLatencyHarness,
Stats helpers.
benchmarks/merge_bmf.py: stateless union of BMF JSON fragments;
exits 1 on (slug, measure) collisions.
bench_throughput.nim gains --bmf-out=<path>; bit-for-bit unchanged
without the flag.
5 new lockfreequeues adapters (sipmuc, mupsic, unbounded_sipsic,
unbounded_sipmuc, unbounded_mupmuc); 3 existing renamed to canonical
<library_slug>_adapter.nim.
Deleted: bmf_adapter.py, bench_main.nim, render_readme.nim.
bench.yml rewired to use native BMF; merge_bmf.py produces the
single merged.json consumed by bencher run.

Track 1 — Latency wired to BMF, multi-measure-per-slug end-to-end

bench_latency.nim rewritten on runLatencyHarness; CLI mirrors
bench_throughput. Per-binary intdefines:
BenchLatencyRuns (33), BenchLatencyMessageCount (100_000),
BenchLatencyWarmupRuns (3).
bench.yml split into three jobs: bench-throughput, bench-latency
(new), bench-upload (merges fragments, single bencher run).
4 smoke-shape slugs carry latency_p50_ns / latency_p95_ns /
latency_p99_ns.
adapter.nim re-exports PushResult / PopResult from bench_common
to unify the two parallel definitions.

Track 2 — Topology split (5 binaries) + per-step CI timeouts

4 throughput binaries replacing bench_throughput.nim:
bench_spsc.nim, bench_mpsc.nim, bench_mpmc.nim, bench_unbounded.nim
(plus the latency binary from Track 1).
Each owns its own per-binary intdefine set so CI budgets each topology
independently.
Deletion-safety: tests/fixtures/pre-split-slugs.json + superset_check.py
ensure the post-split BMF is a superset of the pre-split slug set.
CI matrix with per-step timeout-minutes so a hang in one variant
cannot burn the whole budget.

Track 3 — Comparison MVP: Loony, Boost.LockFree, Crossbeam adapters

5 adapters: LoonyQueue (mpmc_unbounded), boost::lockfree::queue
(mpmc bounded), boost::lockfree::spsc_queue (spsc bounded),
crossbeam::ArrayQueue (mpmc bounded), crossbeam::SegQueue
(mpmc_unbounded).
New benchmarks/rust/bench-ffi-crossbeam/ cdylib (stable toolchain,
panic = "abort" for cross-FFI safety).
Soft-skip CI pattern: install -> smoke -> set-flag-on-success ->
annotate-on-failure. workflow_dispatch accepts force_skip_* inputs.
New bench-comparison.yml runs Crossbeam on a nightly cron (separate
Bencher Report; not on PR critical path).
Cache-line padding fix for unbounded queue segments: now allocated via
posix_memalign through new internal/aligned_alloc.nim. Verified by
tests/t_unbounded_padding.nim.

Track 4 — Comparison expansion: Threading.Channels, Nim Channel, MoodyCamel

3 more adapters (4 with the smoke-test additions). MoodyCamel
ConcurrentQueue vendored as single header at
benchmarks/vendor/concurrentqueue/concurrentqueue.h (commit
d655418bb644b7f85159d94c591d7d983949fb81); extern "C" wrapper
isolates Nim from upstream template machinery.
THIRD_PARTY_LICENSES.md gains its first vendored entry.

Track 5 — Interactive uPlot charts + mkdocs deploy from devel

docs/assets/uplot-1.6.27.iife.min.js vendored (47884 bytes,
SHA-256 verified against jsdelivr).
docs/assets/bench-charts.{js,css} renders an interactive throughput
chart with library-toggle legend, log-scale Y toggle, hover tooltip,
graceful fallbacks. Contract test test_bench_charts_contract.py
guards the BMF -> chart shape.
docs/benchmarks.md embeds the chart via mkdocs-material's
md_in_html. Page is registered in nav.
Snapshot publishing pipeline: merged.json is copied to
docs/assets/bench-results/<sha>.json and latest.json on every
devel push; commits as github-actions[bot]. Three-layer
loop-prevention: paths-ignore, actor guard, narrow [skip ci].
docs.yml extended to redeploy mike's dev/ alias on devel pushes.
Post-deploy curl verifies the published BMF asset returns HTTP 200
and parses as JSON.
README BENCHMARKS markers replaced with a hand-curated 4-row summary
- link to the live chart at the docs site.

Track 6 — Latency thresholds + p999 / max measures

bench_latency.nim emits latency_p999_ns and latency_max_ns
alongside the p50/p95/p99 trio. Full tail-latency profile per
(library, topology) cell.
HistogramTopK 1000 -> 5000: at production sample counts
(BenchLatencyMessageCount=100_000 x BenchLatencyRuns=33,
seenAll ~ 3.3M), p999 rank is ~3300 — K=5000 keeps p999 in the
exact top-K stratum. Memory cost: 40KB per histogram.
Bencher CLI adds latency p99 regression threshold (alongside
the existing throughput threshold).
New bench-tests CI job runs the bench harness test suite
(benchtests nimble task) on every PR; pinned to v0.6.0 /
v0.7.0 sibling deps; no fork-PR guard (no secrets needed for
unit tests).

Tests added

t_bench_common.nim: 26 tests — BMFEmitter ordering, Histogram
percentile accuracy on log-normal, harness smoke, all 5 new
lockfreequeues adapters round-trip.
t_bench_latency.nim: 7 tests — intdefine compile-time asserts,
--bmf-out integration, multi-fragment merge + collision guard.
t_bench_adapters.nim: round-trip set-equality tests for every
comparison adapter, gated by the same -d:adapter_*_available
defines.
t_topology_split.nim: per-binary slug coverage + superset check.
t_unbounded_padding.nim: 8 cache-line-alignment assertions across
the 4 unbounded variants (c/cpp/arc/refc).
t_aligned_alloc.nim: aligned_alloc helper unit tests.
benchmarks/tests/test_merge_bmf.py: 10 tests covering slug regex,
measure regex, collision detection, alpha-sorted output,
NaN rejection.
benchmarks/tests/test_superset_check.py: deletion-safety regression
tests for the topology split.
benchmarks/tests/test_bench_charts_contract.py: 9 tests guarding
the BMF -> JS chart contract.

Verification

All Track-level review feedback (13 lows from constituent PRs)
addressed in-tree before consolidation. Original constituent PRs were
each individually approved by Momus. Local nim test suite: 200/200
pass.

Cost gate context

Bencher upload cost gate is in place: PR-base allowlist limited to
main / devel (no feat/**), and PR uploads require the bench
label. This omnibus PR will trigger bench-throughput /
bench-latency runs on push (non-billable to Bencher absent the
label) but the bencher run upload step skips without it. Add the
bench label deliberately to record this revision.

github-actions · 2026-05-01T13:55:33Z

Bencher Report

Branch	feat/bench-rollup-pr0-bench-common
Testbed	ubuntu-latest

⚠️ WARNING: No Threshold found!
Without a Threshold, no Alerts will ever be generated.
throughput_ops_ms (Measure (units))
Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds flag.

Click to view all benchmark results

Benchmark	throughput_ops_ms	Measure (units) x 1e3
lockfreequeues_mupmuc/mpmc/1p1c	📈 view plot ⚠️ NO THRESHOLD	14.42 units x 1e3
lockfreequeues_mupmuc/mpmc/2p2c	📈 view plot ⚠️ NO THRESHOLD	12.97 units x 1e3
lockfreequeues_mupmuc/mpmc/4p4c	📈 view plot ⚠️ NO THRESHOLD	8.84 units x 1e3
lockfreequeues_mupmuc/mpmc/8p8c	📈 view plot ⚠️ NO THRESHOLD	8.74 units x 1e3
lockfreequeues_sipsic/spsc/1p1c	📈 view plot ⚠️ NO THRESHOLD	7.75 units x 1e3
lockfreequeues_unbounded_mupsic/mpsc_unbounded/1p1c	📈 view plot ⚠️ NO THRESHOLD	14.42 units x 1e3
lockfreequeues_unbounded_mupsic/mpsc_unbounded/2p1c	📈 view plot ⚠️ NO THRESHOLD	13.52 units x 1e3
lockfreequeues_unbounded_mupsic/mpsc_unbounded/4p1c	📈 view plot ⚠️ NO THRESHOLD	12.24 units x 1e3
nim_channels/mpmc/1p1c	📈 view plot ⚠️ NO THRESHOLD	2.19 units x 1e3
nim_channels/mpmc/2p2c	📈 view plot ⚠️ NO THRESHOLD	1.19 units x 1e3
nim_channels/mpmc/4p4c	📈 view plot ⚠️ NO THRESHOLD	1.50 units x 1e3

🐰 View full continuous benchmarking report in Bencher

gemini-code-assist

Code Review

This pull request overhauls the benchmarking infrastructure, replacing the legacy aggregator and regex-based parser with native Bencher Metric Format (BMF) emission in bench_throughput.nim. It introduces a shared harness module (bench_common.nim), a comprehensive set of lock-free queue adapters, and a merge_bmf.py utility for combining results. Review feedback focuses on correcting a type hint in the Python merge script and ensuring CLI error messages are directed to stderr instead of stdout to follow best practices.

gemini-code-assist · 2026-05-01T13:56:39Z

+def _die(msg: str) -> "int":
+    """Print to stderr and return exit code 1."""
+    print(msg, file=sys.stderr)
+    return 1


The _die function returns an integer exit code but is typed to return a string. This should be corrected to return int.

gemini-code-assist · 2026-05-01T13:56:39Z

+          if p.val.len == 0:
+            echo "Missing value for --bmf-out"
+            quit 1


The error message for a missing value for --bmf-out is printed to stdout. It is best practice to print error messages to stderr.

echo "Missing value for --bmf-out" quit(1, QuitFailure)

gemini-code-assist · 2026-05-01T13:56:39Z

+          echo "Unknown flag: --", p.key
+          quit 1


The error message for an unknown flag is printed to stdout. It is best practice to print error messages to stderr.

stderr.writeLine("Unknown flag: --", p.key) quit(1, QuitFailure)

gemini-code-assist · 2026-05-01T13:56:39Z

          echo "Unknown variant group: ", arg
          echo "Supported: ", SupportedGroups
          quit 1


The error message for an unknown variant group is printed to stdout. It is best practice to print error messages to stderr.

stderr.writeLine("Unknown variant group: ", arg) stderr.writeLine("Supported: ", SupportedGroups) quit(1, QuitFailure)

Compile-only stub for benchmarks/nim/bench_common.nim. Types and proc signatures are final per design doc section 2.1; bodies raise AssertionDefect with 'not implemented' so tasks 0.2-0.6 can be written and reviewed in parallel against this surface. Public surface stubbed: - Topology enum (tSpsc/tMpsc/tMpmc + unbounded variants) - PushResult / PopResult adapter primitives - BMFEmitter + initBMFEmitter / addMeasure / emit - Stats helpers (mean / stddev / minVal / maxVal / percentile) - Histogram + initHistogram / record / topK / percentile (HistogramTopK = 1000, HistogramReservoir = 99000) - LatencyMetrics + runLatencyHarness - ThroughputMetrics + runThroughputHarness Test (tests/t_bench_common.nim) is compile-time-only: it references each promised symbol via 'when declared' and 'when compiles', leaving behavior tests to subsequent tasks. Stub file: 170 LOC (acceptance cap: 200).

initBMFEmitter / addMeasure / emit per design doc section 2.1. - addMeasure stores via Table[slug, Table[measure, MeasureValue]]; NaN sentinel for lower/upper bounds maps to none[float](), finite values map to some(v). - emit writes JSON with slugs alpha-sorted at the top level and measures alpha-sorted within each slug. lower_value/upper_value fields are emitted only when the corresponding bound is some. - Output format: pretty(root, 2) plus a trailing newline so diffs in CI / local-dev surface line-level changes cleanly. Tests (tests/t_bench_common.nim): empty emit produces {}, two slugs alpha-ordered, measures alpha-ordered within slug, NaN bounds omitted from output, finite bounds emitted as lower_value/upper_value.

mean / stddev / minVal / maxVal / percentile bodies. Sort-based percentile uses linear interpolation between adjacent samples; matches numpy's default percentile interpolation ('linear'). stddev uses ddof=1 (sample) for consistency with numpy default. Empty-input behavior matches the legacy benchmarks/nim/stats.nim that this module replaces (returns 0.0 across the board). Tests: numpy-baseline parity for stddev on [1,2,3,4]; min/max on unordered input; percentile(p=0.5) on [0..99] = 49.5; percentile extremes (p=0.0, p=1.0) match min and max.

Online percentile tracker per design doc section 2.1. - topKHeap: min-heap of HistogramTopK = 1000 largest values seen. record() pushes if heap < K; else replace if value > heap[0] (evicts the min into the body bucket via reservoirAdmit). - reservoir: Algorithm R uniform sample of body distribution, size HistogramReservoir = 99_000. - debugMode (initHistogram(true)): records every sample to debugAll and percentile() returns the exact sort answer over debugAll. Used as the test oracle and as a debug-only fallback per design 2.1. Lookup rule with stratified-estimator math: p == 1.0 -> top-K max seenAll*(1-p) <= topKHeap.len -> top-K rank lookup (linear interp) otherwise -> reservoir, with p rescaled to p_body = (p*seenAll) / (seenAll - K) since the reservoir samples ONLY the body stratum (values that lost the top-K race). Without rescaling a reservoir-based p99 underestimates by ~25% on log-normal data. Tests: p99 within 1% of sort fallback on 100K log-normal samples; percentile(1.0) == top-K max; p50 within 5% of sort fallback; debug mode returns exact sort answer; topK() returns ascending-sorted samples.

Generic N-producer / N-consumer throughput runner factored out of the legacy benchmarks/nim/bench_throughput.nim. Parameterized over the queue init proc and the queue type Q (constrained at instantiation time via 'mixin push' / 'mixin pop' so adapter-specific push/pop are resolved at the call site). The remainder-spread deadlock-avoidance rule is preserved verbatim: if messageCount mod numProducers != 0, the leading producers each get one extra item so the producer-side total exactly equals messageCount; same for consumers. Without this, the harness silently degrades into a deadlock (consumers waiting forever) or strands items in the queue (consumers stop early). Returns ThroughputMetrics{ops_ms_mean, ops_ms_stddev, runs}; warmup runs are discarded. Smoke test in tests/t_bench_common.nim with a 1024-cap channel SmokeAdapter at 1P/1C × 1000 messages × 1 run.

Ping-pong RTT runner. For each run: - allocates two queues fwd + rev of type Q via queueInit() - spawns 1 pinger + 1 ponger thread - pinger pushes a monotonic-ns timestamp via fwd, spins on rev for the echo, records RTT (ns) into a per-run Histogram - ponger pops from fwd, pushes the same value to rev Aggregation per design 2.5 'Run aggregation for latency': reported percentiles are the MEAN across runs of each percentile, NOT the percentile of unioned samples. Per-run histograms isolate scheduler artifacts to the run that contains them. Returns LatencyMetrics{p50_ns, p95_ns, p99_ns, p999_ns, max_ns, samples}. samples is the cumulative across-runs sample count. This task ships the 1P/1C smoke topology only; multi-thread sharding lands in PR 1 (Task 1.x). 1P/1C is sufficient for the smoke check because PR 1's first BMF measure uses Sipsic 1P/1C as the smoke shape. Test: SmokeAdapter (channel-backed), 1000 messages, 1 run, 0 warmup; asserts samples >= 1000 and p50 < p99 < max.

Stateless union of per-slug measure dicts across N input BMF JSON files into one output JSON, per design doc section 4.3. Required because Bencher.dev creates a separate Report per 'bencher run' invocation; multiple uploads from sibling CI jobs would NOT co-locate multiple measures on a single per-slug history. Contract: merge_bmf.py <output_path> <input1> [input2 ...] - Slug regex: ^[a-z][a-z0-9_]*/[a-z][a-z0-9_]*/\d+p\d+c$ - Measure regex: ^[a-z][a-z0-9_]*$ - Value fields ('value', 'lower_value', 'upper_value') must be finite, non-NaN floats. Any malformed input or non-finite value -> exit 1. - Collision (same slug + same measure across two inputs) -> exit 1 with stderr 'error: collision on slug=<slug> measure=<measure> in files <pathA> and <pathB>'. - Output is alpha-sorted at both top level (slugs) and within each slug (measures), 2-space indent. Tests in benchmarks/tests/test_merge_bmf.py cover all 8 design-spec cases plus an alpha-sort assertion: zero-input usage error, single empty pass-through, single valid round-trip, two disjoint slugs, shared slug + disjoint measures, collision, NaN rejection, invalid measure key (uppercase), invalid slug shape.

Adds the 5 adapters not yet present per design doc 2.2: - lockfreequeues_sipmuc_adapter.nim (bounded SPMC) - lockfreequeues_mupsic_adapter.nim (bounded MPSC) - lockfreequeues_unbounded_sipsic_adapter.nim - lockfreequeues_unbounded_sipmuc_adapter.nim - lockfreequeues_unbounded_mupmuc_adapter.nim Each adapter file exports: - 'topologiesSupported: set[Topology]' constant (consumed by PR 3 Task 3.11 to restrict shape generation; declared but unused in PR 0) - 'make<Lib>Adapter' factory taking 'capacity: int' - 'cleanup' tear-down proc - 'push(item)' returning bench_common.PushResult - 'pop()' returning bench_common.PopResult[T] Smoke test in tests/t_bench_common.nim: each adapter pushes 100 sequential uint64s, pops them, asserts count == 100 AND set-equality with the input range. Implementation note (Nim 2.2.6 codegen workaround): The unbounded adapters store the queue INLINE rather than via heap pointer. Heap-allocating UnboundedSipsic/UnboundedSipmuc/ UnboundedMupmuc and then calling reset() or =destroy() on the referent from a generic adapter cleanup proc triggers 'internal error: getTypeDescAux(tyGenericParam)' OR 'internal error: expr(nkIdent); unknown node kind' in Nim 2.2.6 when bench_common is also imported. Inline storage lets the compiler emit the per-instance =destroy automatically when the adapter goes out of scope, sidestepping the bug. The DEBRA manager is still heap-allocated because the queue constructors take 'ptr DebraManager'. Track upstream Nim repro for follow-up.

…port `topologiesSupported` Per design section 2.2 (`<library_slug>_adapter.nim` filename convention): - Rename via `git mv` (history preserved): - lockfreequeues_sipsic.nim -> lockfreequeues_sipsic_adapter.nim - lockfreequeues_mupmuc.nim -> lockfreequeues_mupmuc_adapter.nim - lockfreequeues_unbounded_mupsic.nim -> lockfreequeues_unbounded_mupsic_adapter.nim - Update bench_throughput.nim import paths to match. - Add `topologiesSupported*: set[Topology]` constant to all four existing adapters (3 renamed + channels_adapter). Use selective `from ../bench_common import Topology, <member>` to avoid pulling PushResult into scope, which would conflict with adapter.PushResult (the legacy concept in ../adapter still imported by these adapters). - `topologiesSupported` is exported in PR 0 but consumed only by PR 3 Task 3.11; the "unused export" hint is expected per the impl plan.

…ughput Add --bmf-out=<path> flag (parsed via std/parseopt) that emits Bencher Metric Format JSON for every variant that ran. With the flag absent the binary is bit-for-bit unchanged; the existing positional CLI (`bench_throughput sipsic mupmuc unbounded_mupsic channels`) is preserved verbatim. Slug shape per design 2.2: lockfreequeues_sipsic/spsc/1p1c lockfreequeues_mupmuc/mpmc/{1,2,4,8}p{1,2,4,8}c lockfreequeues_unbounded_mupsic/mpsc_unbounded/{1,2,4}p1c nim_channels/mpmc/{1,2,4}p{1,2,4}c Each slug carries the `throughput_ops_ms` measure as {value: mean, lower_value: mean-stddev, upper_value: mean+stddev} — matching the convention bmf_adapter.py used (which this code replaces in Task 0.12). Note on slug for built-in channels: the human-readable adapter name() returns "nim/channels" (kept for stdout text), but the slug uses underscore form `nim_channels` so it satisfies the merge_bmf.py SLUG_RE pattern `^[a-z][a-z0-9_]*/[a-z][a-z0-9_]*/\d+p\d+c$`. Per-variant {.intdefine.}s for runs/warmup added so the integration test can build a fast variant via -d:BenchSipsicRuns=2 -d:BenchSipsicWarmup=0. Default values (10 runs, 0 warmup) match the prior hard-coded behavior. Selective `from ./bench_common import` is used to pull only {BMFEmitter, initBMFEmitter, addMeasure, emit} — importing the full module would shadow `results.ThroughputMetrics` with `bench_common.ThroughputMetrics` and break the legacy bespoke harness procs that still drive Mupmuc and UnboundedMupsic. Test: tests/t_bench_common.nim adds an integration test that compiles bench_throughput with small overrides, runs it with --bmf-out=<tmp> sipsic, parses the resulting JSON, and asserts the expected slug and measure exist with a positive value. Stdout text output is also asserted (`Sipsic` substring) so the legacy CLI contract is exercised.

Apply the four edits from design section 3 (PR 0): (a) Remove `Run adapter unit tests` step. The Python regex parser it sanity-checked is gone in Task 0.12; nothing left to test. (b) Remove `Convert to BMF JSON` step. bench_throughput now emits BMF natively via `--bmf-out=throughput.json` (Task 0.10). (c) Update `Run bench_throughput` step to add the `--bmf-out=throughput.json` flag while preserving the existing positional variant filter and the 20-minute step timeout. (d) Insert `Merge BMF JSON` step running `python3 benchmarks/merge_bmf.py merged.json throughput.json`. Single-input today, but stays in place for PR 2-4 when bench is split into per-topology binaries each producing its own fragment. (e) Change all three `bencher run` invocations and the debug-cat step to use `--file merged.json` instead of `bench_results.json`. actionlint clean. The bench upload contract is unchanged: same project, branch, testbed, thresholds, comparison semantics — only the file source flips from regex-parsed text to native BMF.

…or consumers Three legacy files removed via `git rm`: - `benchmarks/bmf_adapter.py` — regex parser for bench_throughput stdout. Replaced by native BMF emission via `--bmf-out=` (Task 0.10). - `benchmarks/test_bmf_adapter.py` — unit tests for the parser. Replaced by `benchmarks/tests/test_merge_bmf.py` (Task 0.7). - `benchmarks/nim/bench_main.nim` — aggregator binary that wrapped bench_throughput + bench_latency and emitted a custom JSON shape. Bench_throughput is now the canonical entry point and emits BMF directly; bench_latency follows in PR 1. Live consumers refactored in this same task (per impl plan §0.12): - `benchmarks/runner.py`: build_nim now compiles bench_throughput.nim; run_nim invokes `bench_throughput --bmf-out=<path>` instead of `bench_main --runs=<n> -o=<path>`. The `runs` argument is preserved in the CLI for back-compat but is now compile-time-only. - `lockfreequeues.nimble`: `task benchmarks` switched to bench_throughput. Output path corrected to `.tmp/bench_throughput` (matching the project nim.cfg `--outdir:.tmp`). - `benchmarks/README.md`: rewritten to document the new flow (bench_common module, adapter convention, merge_bmf.py, native BMF emission, expected slug set). - `.gitignore`: drop the `bench_main` build-artifact ignore line. `git grep "bmf_adapter\|bench_main"` returns only intentional historical references in code comments and CHANGELOG (prior release notes). Acceptance: `nimble tasks` parses; `nim c benchmarks/nim/bench_throughput.nim` compiles; tests/t_bench_common.nim passes all 26 tests including the Task 0.10 --bmf-out integration test.

The legacy bench_main aggregator emitted JSON of shape `{results: [{implementation, thread_config, metrics: {...}}], metadata: {...}}`. Bench_main was deleted in Task 0.12, replaced by `bench_throughput --bmf-out` (Task 0.10) + `merge_bmf.py` (Task 0.7) which produce Bencher Metric Format JSON of shape `{<lib>/<topology>/<P>p<C>c: {<measure>: {value, lower_value?, upper_value?}}}`. `benchmarks/render_readme.nim` now walks the BMF shape directly: - New `parseSlug(slug)` helper splits `<lib>/<topology>/<P>p<C>c` into `(impl, "<P>P/<C>C")` for the two table columns. Strict on shape: rejects slugs with the wrong segment count or non-digit P/C, so malformed entries do not produce noise rows. - `loadRows` walks every top-level slug, pulls `<slug>.throughput_ops_ms.value` for the throughput column, and `<slug>.latency_p50_ns.value` for the optional latency column (PR 1+ emits this; PR 0 does not, so all PR 0 rows show "—"). - BMF has no `metadata` block, so the meta argument is always nil. `renderTable` already accepts nil and skips the platform header in that case — no change needed. - `when isMainModule` block guarded by `not defined(renderReadmeNoMain)` so unit tests can include this file without triggering the README rewrite. CLI invocations (`nim r benchmarks/render_readme.nim`) leave the define unset and run main as before. Test: tests/t_render_readme.nim (new) writes a fixture BMF JSON with three slugs (mupmuc/2p2c, sipsic/1p1c, nim_channels/1p1c), runs `loadRows` and `renderBlock`, and asserts: - 3 rows produced; impl/threads/throughput parsed correctly per slug - alpha sort by impl preserved - meta == nil (BMF carries no metadata) - rendered Markdown table contains every impl name and value - latency column is em-dash for every row (no `latency_p50_ns` in fixture) - missing JSON file falls back to placeholder - empty `{}` BMF document falls back to placeholder All 4 new tests + 26 existing t_bench_common tests pass.

Document Track 0 of the bench-rollup feature under [Unreleased]: - Added: bench_common.nim module, 5 new lockfreequeues adapters, merge_bmf.py CLI, --bmf-out flag, per-variant run-count intdefines. - Changed: bench_throughput emits BMF natively, 3 adapters renamed to the <library_slug>_adapter.nim convention, render_readme.nim consumes the new BMF shape, runner.py / nimble task / benchmarks/README.md point at bench_throughput. - Removed: bmf_adapter.py, test_bmf_adapter.py, bench_main.nim. Each entry is one bullet (one logical change), no test plan checklist, no GitHub issue references.

elijahr · 2026-05-02T05:40:13Z

/gemini review

gemini-code-assist

Code Review

This pull request overhauls the benchmarking infrastructure by introducing a shared harness module (bench_common.nim), native Bencher Metric Format (BMF) JSON emission, and updated adapters for various lock-free queue implementations. It replaces the legacy bench_main aggregator and Python-based regex parser with a robust system including a BMF merging utility and updated documentation. Feedback identifies memory leaks in the latency and throughput harnesses due to missing queue cleanup, and resource leaks in adapters where heap-allocated queues are deallocated without invoking destructors.

gemini-code-assist · 2026-05-02T05:44:36Z

+proc runOneLatencyRun[Q](
+    queueInit: proc(): Q,
+    messageCount: int,
+): LatencyMetrics =
+  ## One ping-pong RTT run. Allocates fwd + rev queues of type Q,
+  ## spawns 1 pinger and 1 ponger, records RTT samples, returns
+  ## LatencyMetrics{p50, p95, p99, p999, max, samples}.
+  ##
+  ## NOTE: This task ships the 1P/1C smoke topology only. PR 1 (Task 1.x)
+  ## extends to multi-pinger / multi-ponger by sharding `messageCount`
+  ## across additional threads with the remainder-spread rule from
+  ## runThroughputHarness. The 1P/1C path keeps RTT semantics clean
+  ## (no scheduler interleave between ping and pong) which is what the
+  ## p50 / p99 measurements in PR 1 hinge on.
+  var fwd = queueInit()
+  var rev = queueInit()
+  var hist = initHistogram()
+
+  var pingerCtx = PingerCtx[Q](
+    fwd: addr fwd, rev: addr rev,
+    count: messageCount,
+    histogram: addr hist,
+  )
+  var pongerCtx = PongerCtx[Q](
+    fwd: addr fwd, rev: addr rev,
+    count: messageCount,
+  )
+
+  var pingerThread: Thread[ptr PingerCtx[Q]]
+  var pongerThread: Thread[ptr PongerCtx[Q]]
+
+  createThread(pingerThread, pingerThreadBody[Q], addr pingerCtx)
+  createThread(pongerThread, pongerThreadBody[Q], addr pongerCtx)
+  joinThread(pingerThread)
+  joinThread(pongerThread)
+
+  result = LatencyMetrics(
+    p50_ns: hist.percentile(0.50),
+    p95_ns: hist.percentile(0.95),
+    p99_ns: hist.percentile(0.99),
+    p999_ns: hist.percentile(0.999),
+    max_ns: hist.percentile(1.0),
+    samples: hist.seenAll,
+  )


The runOneLatencyRun procedure leaks memory because it creates two queue instances (fwd and rev) but never calls their cleanup procedures. Since the adapters (e.g., LockfreequeuesMupsicAdapter) allocate memory on the heap, this will lead to significant memory accumulation over multiple benchmark runs.

gemini-code-assist · 2026-05-02T05:44:36Z

+proc runOneThroughputRun[Q](
+    queueInit: proc(capacity: int): Q,
+    capacity: int,
+    numProducers: int,
+    numConsumers: int,
+    messageCount: int,
+): float =
+  ## One run of the throughput harness; returns ops/ms.
+  ##
+  ## Distribution invariants (mirrors legacy bench_throughput
+  ## `runThroughputBenchmark`):
+  ##   1. sum(producer.count) == messageCount
+  ##   2. sum(consumer.count) == messageCount
+  ## A naive `messageCount div N` truncates and breaks both when
+  ## `messageCount` is not divisible by `numProducers` or `numConsumers`,
+  ## which deadlocks the consumers (waiting for items the producers never
+  ## enqueue) or strands items in the queue (consumers stop early).
+  ## Spread the remainder over the first `messageCount mod N` workers so
+  ## totals match exactly for any (P, C, messageCount) triple.
+  var queue = queueInit(capacity)
+  let baseP = messageCount div numProducers
+  let remP = messageCount mod numProducers
+  let baseC = messageCount div numConsumers
+  let remC = messageCount mod numConsumers
+
+  var producerThreads = newSeq[Thread[ptr ProducerCtx[Q]]](numProducers)
+  var consumerThreads = newSeq[Thread[ptr ConsumerCtx[Q]]](numConsumers)
+  var producerCtxs = newSeq[ProducerCtx[Q]](numProducers)
+  var consumerCtxs = newSeq[ConsumerCtx[Q]](numConsumers)
+
+  var nextStart = 0
+  for i in 0 ..< numProducers:
+    let count = baseP + (if i < remP: 1 else: 0)
+    producerCtxs[i] = ProducerCtx[Q](
+      queue: addr queue,
+      startIdx: nextStart,
+      count: count,
+    )
+    nextStart += count
+
+  for i in 0 ..< numConsumers:
+    let count = baseC + (if i < remC: 1 else: 0)
+    consumerCtxs[i] = ConsumerCtx[Q](
+      queue: addr queue,
+      count: count,
+    )
+
+  # Monotonic clock — `epochTime` (wall clock) can step backward across
+  # NTP adjustments and skew throughput numbers. Nanosecond precision:
+  # ms-precision buckets multiple short runs into the same integer ms,
+  # producing identical samples and stddev=0 on a fast CI runner. ops/ms
+  # is reconstructed as a float at print time.
+  let startTime = getMonoTime()
+
+  for i in 0 ..< numProducers:
+    createThread(producerThreads[i], producerThreadBody[Q],
+                 addr producerCtxs[i])
+  for i in 0 ..< numConsumers:
+    createThread(consumerThreads[i], consumerThreadBody[Q],
+                 addr consumerCtxs[i])
+
+  for i in 0 ..< numProducers:
+    joinThread(producerThreads[i])
+  for i in 0 ..< numConsumers:
+    joinThread(consumerThreads[i])
+
+  let elapsedNs = float(inNanoseconds(getMonoTime() - startTime))
+  if elapsedNs <= 0.0:
+    return 0.0
+  result = float(messageCount) * 1_000_000.0 / elapsedNs
+


The runOneThroughputRun procedure leaks memory because it creates a queue instance but never calls its cleanup procedure. This results in a memory leak for every benchmark run and warmup iteration.

gemini-code-assist · 2026-05-02T05:44:36Z

+proc cleanup*[N, P: static int, T](
+    a: var LockfreequeuesMupsicAdapter[N, P, T]
+) =
+  if a.queue != nil:
+    dealloc(a.queue)
+    a.queue = nil


The cleanup procedure uses dealloc on the queue pointer without running its destructor. This will leak any resources held by the Mupsic instance (like its internal buffer if T is a managed type). You should call reset(a.queue[]) or =destroy(a.queue[]) before deallocating the memory.

gemini-code-assist · 2026-05-02T05:44:36Z

+proc cleanup*[N, C: static int, T](
+    a: var LockfreequeuesSipmucAdapter[N, C, T]
+) =
+  if a.queue != nil:
+    dealloc(a.queue)
+    a.queue = nil


Similar to the Mupsic adapter, dealloc here skips the destructor for the Sipmuc instance. Use reset(a.queue[]) before dealloc to ensure proper resource reclamation.

… harness Per Gemini review on PR #19. Adapters whose Q heap-allocates the backing queue (lockfreequeues bounded mupsic/sipmuc/mupmuc, channels) leaked the queue every time runOneLatencyRun or runOneThroughputRun returned: `var fwd = queueInit()` only frees the adapter struct's stack slot at scope exit, not the create()-d Mupsic/Sipmuc/Mupmuc/ Channel pointee. Across N runs * M warmups the leak grew without bound. Two-part fix: 1. bench_common.nim: harness now drops the queue via a uniform `mixin cleanup; when compiles(cleanup(q)): cleanup(q)` block after joinThread (and before any return path, including the elapsedNs<=0 short-circuit in runOneThroughputRun). `when compiles` keeps value-type adapters (sipsic, unbounded inline storage) usable without forcing them to define a no-op cleanup. 2. lockfreequeues_mupsic / sipmuc cleanup procs: prepend reset(a.queue[]) so Mupsic/Sipmuc's `=destroy` actually runs before the heap slot is freed. Bare dealloc skipped any queue-internal cleanup that an ARC/ORC-generated destructor would otherwise do (managed T fields, future atomic-cleanup hooks). Same fix on the mupmuc adapter inside deinitMupmucAdapter. 3. mupmuc and channels adapters gain a `cleanup` alias around their existing deinit*Adapter procs so the harness's mixin cleanup call resolves uniformly across adapter naming conventions. Verified by smoke build of `runLatencyHarness` + `runThroughputHarness` against LockfreequeuesMupsicAdapter; no leak-on-return, percentiles and ops/ms reported as expected.

elijahr · 2026-05-02T06:12:41Z

/gemini review

gemini-code-assist · 2026-05-02T06:12:44Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

elijahr · 2026-05-02T10:13:39Z

/review

qodo-code-review · 2026-05-02T10:13:45Z

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0)

1. ~~Premature DebraManager free~~ ☑ 🐞 Bug ☼ Reliability

Description

cleanup() in the new unbounded adapters deallocates the DebraManager while the inline queue
still retains a pointer to it; when the queue is later destroyed it will call
unbindClient(self.manager[]) on freed memory. This can trigger destructor assertions or memory
corruption/UAF during adapter scope exit.

Code

benchmarks/nim/adapters/lockfreequeues_unbounded_mupmuc_adapter.nim[R43-51]

+proc cleanup*[S: static int, T; MaxThreads: static int](
+    a: var LockfreequeuesUnboundedMupmucAdapter[S, T, MaxThreads]
+) =
+  ## Free manager only; queue is inline and its `=destroy` runs when
+  ## the adapter object goes out of scope.
+  if a.manager != nil:
+    reset(a.manager[])
+    dealloc(a.manager)
+    a.manager = nil

Evidence
Both unbounded adapters free a.manager in cleanup() but do not destroy/reset the inline queue
first. The underlying UnboundedMupmuc/UnboundedSipmuc types store the manager pointer inside the
queue object (manager: ptr DebraManager[...]), bind a client refcount on construction
(bindClient(manager[])), and later unbind during =destroy via unbindClient(self.manager[]), so
the manager must outlive the queue instance.
benchmarks/nim/adapters/lockfreequeues_unbounded_mupmuc_adapter.nim[43-51]
benchmarks/nim/adapters/lockfreequeues_unbounded_sipmuc_adapter.nim[43-55]
src/lockfreequeues/unbounded_mupmuc.nim[72-89]
src/lockfreequeues/unbounded_mupmuc.nim[144-146]
src/lockfreequeues/unbounded_mupmuc.nim[436-465]
src/lockfreequeues/unbounded_sipmuc.nim[75-92]
src/lockfreequeues/unbounded_sipmuc.nim[140-142]
src/lockfreequeues/unbounded_sipmuc.nim[376-405]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The unbounded adapters' `cleanup()` frees the `DebraManager` while the inline `queue` still holds a pointer to it. The queue destructor later unbinds from `self.manager[]`, which will be freed, causing assertions or UAF.
### Issue Context
`UnboundedMupmuc`/`UnboundedSipmuc` call `bindClient(manager[])` in their constructors and `unbindClient(self.manager[])` in `=destroy`, so the manager must remain valid until after the queue is destroyed/reset.
### Fix Focus Areas
- benchmarks/nim/adapters/lockfreequeues_unbounded_mupmuc_adapter.nim[43-51]
- benchmarks/nim/adapters/lockfreequeues_unbounded_sipmuc_adapter.nim[43-55]
### Recommended fix
Update `cleanup()` to destroy/reset the inline queue *before* resetting/deallocating the manager, e.g.:
- call `reset(a.queue)` (or otherwise invoke the queue's destructor) while `a.manager` is still valid
- then `reset(a.manager[])` and `dealloc(a.manager)`
Alternative (often simpler): store the manager inline in the adapter and pass `addr a.manager` into `newUnbounded*`, so normal field-destruction order naturally keeps the manager alive until after the queue is destroyed.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Harness zero-thread crash 🐞 Bug ☼ Reliability ⭐ New

Description

runOneThroughputRun divides/mods by numProducers and numConsumers without checking they’re >0,
so runThroughputHarness can crash immediately on 0 inputs. Because this is an exported shared
harness, this becomes a sharp edge for any future caller and fails without a clear error message.

Code

benchmarks/nim/bench_common.nim[R501-506]

+  var queue = queueInit(capacity)
+  let baseP = messageCount div numProducers
+  let remP = messageCount mod numProducers
+  let baseC = messageCount div numConsumers
+  let remC = messageCount mod numConsumers
+

Evidence

The new shared harness computes per-worker counts via div/mod using the user-provided
numProducers/numConsumers, but there is no guard in either runOneThroughputRun or the exported
runThroughputHarness to prevent 0 from reaching these operations.

benchmarks/nim/bench_common.nim[482-506]
benchmarks/nim/bench_common.nim[562-591]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`benchmarks/nim/bench_common.nim` computes `messageCount div numProducers` / `messageCount div numConsumers` without guarding that `numProducers` and `numConsumers` are > 0. This can crash the benchmark harness when misconfigured and produces an unclear failure mode.

### Issue Context
This is a shared harness exported for reuse; even if current call sites pass valid values, it should fail fast with a clear message or return a safe default when parameters are invalid.

### Fix Focus Areas
- benchmarks/nim/bench_common.nim[482-506]
- benchmarks/nim/bench_common.nim[562-591]

### What to change
- Add an explicit guard (e.g., `doAssert numProducers > 0 and numConsumers > 0`) in `runOneThroughputRun` or (preferably) in the exported `runThroughputHarness*`.
- Optionally also guard `messageCount >= 0` / `capacity >= 0` and define behavior for `runCount <= 0` (match the latency harness style).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. Adapters block per-thread registration 🐞 Bug ≡ Correctness ⭐ New

Description

The new lockfreequeues_mupsic_adapter and lockfreequeues_sipmuc_adapter store a single
Producer/Consumer created at adapter construction time and keep the underlying queue pointer
private, so callers cannot obtain per-thread producers/consumers as the adapters’ own comments
describe. This prevents implementing correct multi-thread topologies and risks accidental
cross-thread use of a single Producer/Consumer.

Code

benchmarks/nim/adapters/lockfreequeues_mupsic_adapter.nim[R15-43]

+type
+  LockfreequeuesMupsicAdapter*[N, P: static int, T] = object
+    queue: ptr Mupsic[N, P, T]
+    producer: Producer[N, P, T]
+
+proc makeLockfreequeuesMupsicAdapter*[N, P: static int, T](
+    capacity: int = N
+): LockfreequeuesMupsicAdapter[N, P, T] =
+  ## Allocate and initialize a Mupsic[N, P, T]. The pre-built `producer`
+  ## slot lets the smoke / 1p1c shape drive push from the calling
+  ## thread; multi-producer shapes register additional producers
+  ## per-thread via `queue.getProducer(idx = i)`.
+  doAssert capacity == N, "capacity must equal static N"
+  result.queue = create(Mupsic[N, P, T])
+  result.queue[] = initMupsic[N, P, T]()
+  result.producer = result.queue[].getProducer(idx = 0)
+
+proc cleanup*[N, P: static int, T](
+    a: var LockfreequeuesMupsicAdapter[N, P, T]
+) =
+  if a.queue != nil:
+    reset(a.queue[])
+    dealloc(a.queue)
+    a.queue = nil
+
+proc push*[N, P: static int, T](
+    a: var LockfreequeuesMupsicAdapter[N, P, T], item: T
+): PushResult =
+  if a.producer.push(item): prSuccess else: prFull

Evidence
Both adapters claim that multi-thread shapes should register per-thread producers/consumers via
queue.getProducer(idx=...) / getConsumer(idx=...), but the only queue reference is a
non-exported field, and push/pop use the single stored producer/consumer. Existing throughput
code for similar queues demonstrates the needed pattern: acquiring producer/consumer inside each
worker thread.
benchmarks/nim/adapters/lockfreequeues_mupsic_adapter.nim[4-52]
benchmarks/nim/adapters/lockfreequeues_sipmuc_adapter.nim[7-65]
benchmarks/nim/bench_throughput.nim[174-206]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`lockfreequeues_mupsic_adapter.nim` and `lockfreequeues_sipmuc_adapter.nim` currently:
- create exactly one `Producer`/`Consumer` at construction time, and
- keep the backing `queue` pointer private,
while comments state multi-thread shapes should register per-thread producers/consumers via `queue.getProducer/getConsumer`.

This makes correct multi-threaded benchmarking impossible with the current adapter surface and can lead to unsafe sharing of a single producer/consumer across threads.

### Issue Context
`benchmarks/nim/bench_throughput.nim` already documents and implements the correct approach for similar queues: allocate producer/consumer inside each worker thread via `getProducer/getConsumer`.

### Fix Focus Areas
- benchmarks/nim/adapters/lockfreequeues_mupsic_adapter.nim[4-52]
- benchmarks/nim/adapters/lockfreequeues_sipmuc_adapter.nim[7-65]
- benchmarks/nim/bench_throughput.nim[174-206]

### What to change (pick one consistent design)
Option A (recommended for harness integration):
- Export the underlying queue field (e.g., `queue*`) and/or add exported accessors:
 - `proc getProducer*(a: var Adapter; idx: int): Producer[...]`
 - `proc getConsumer*(a: var Adapter; idx: int): Consumer[...]`
- Update multi-thread harness code to call these per-thread inside thread bodies.

Option B (adapter owns registration):
- Redesign `push/pop` so each thread has its own adapter instance containing its own producer/consumer (no sharing).

In either option:
- Keep the current single-thread `producer0/consumer0` fast-path if needed for smoke tests, but make multi-thread usage unambiguous and supported by the exported API.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. Non-finite BMF emission 🐞 Bug ≡ Correctness

Description

bench_throughput emits BMF measures from mean/stddev without checking finiteness, but its
throughput calculations still divide by elapsedNs without guarding against elapsedNs <= 0. If
any run yields inf/NaN, merge_bmf.py will exit 1 due to schema validation and the CI bench
workflow will fail.

Code

benchmarks/nim/bench_throughput.nim[R553-566]

+  # BMF accumulator. Always allocate (cheap) so the per-variant
+  # `addMeasure` calls do not need to be guarded; emit only at end if
+  # `--bmf-out` was given.
+  var emitter = initBMFEmitter()
+
+  proc emitThroughputSlug(
+      em: var BMFEmitter, slug: string, mean, stddev: float
+  ) =
+    ## Map legacy `results.ThroughputMetrics{mean, stddev}` onto the BMF
+    ## triple `{value, lower_value, upper_value}` per design 2.3:
+    ## value=mean, bounds = mean +/- 1*stddev (1-sigma band; matches the
+    ## convention used by `bmf_adapter.py` which this code replaces).
+    em.addMeasure(slug, "throughput_ops_ms", mean, mean - stddev, mean + stddev)
+

Evidence

The new BMF emission path unconditionally calls addMeasure(...) with the computed mean and
bounds. The legacy throughput benchmark code computes ops/ms via messageCount / elapsedNs with no
elapsedNs <= 0 guard, so a zero-duration measurement can produce non-finite values. The new
merge_bmf.py step explicitly rejects any non-finite float in value/lower_value/upper_value,
while the new shared harness (bench_common) includes an elapsedNs <= 0 guard—highlighting the
missing protection in bench_throughput.

benchmarks/nim/bench_throughput.nim[553-566]
benchmarks/nim/bench_throughput.nim[139-140]
benchmarks/merge_bmf.py[51-80]
benchmarks/nim/bench_common.nim[548-560]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`bench_throughput` now emits BMF directly, but it does not guard against non-finite numbers. If a run produces `elapsedNs <= 0` (or any other source of non-finite mean/stddev), the emitted JSON will contain non-finite values and `merge_bmf.py` will fail CI.
### Issue Context
- `merge_bmf.py` enforces `math.isfinite` for all numeric fields.
- `bench_common.runOneThroughputRun` already contains an `elapsedNs <= 0` guard, but the legacy `bench_throughput` benchmark functions do not.
### Fix Focus Areas
- benchmarks/nim/bench_throughput.nim[553-566]
- benchmarks/nim/bench_throughput.nim[139-140]
- benchmarks/merge_bmf.py[51-80]
### Recommended fix
Do one (or both) of:
1) Add `elapsedNs <= 0.0` guards in the legacy run procs (`runThroughputBenchmark`, `runMupmucBenchmark`, `runUnboundedMupsicBenchmark`) returning `0.0` (or skipping the sample) to prevent `inf`.
2) In `emitThroughputSlug` (or `addMeasure`), check `mean` and bounds are finite before recording; if `mean` is non-finite, skip emitting that `(slug, measure)` entirely so the rest of the run still uploads successfully.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Previous review results

Review updated until commit b330905

Results up to commit N/A

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. ~~Premature DebraManager free~~ ☑ 🐞 Bug ☼ Reliability

Description

cleanup() in the new unbounded adapters deallocates the DebraManager while the inline queue
still retains a pointer to it; when the queue is later destroyed it will call
unbindClient(self.manager[]) on freed memory. This can trigger destructor assertions or memory
corruption/UAF during adapter scope exit.

Code

benchmarks/nim/adapters/lockfreequeues_unbounded_mupmuc_adapter.nim[R43-51]

+proc cleanup*[S: static int, T; MaxThreads: static int](
+    a: var LockfreequeuesUnboundedMupmucAdapter[S, T, MaxThreads]
+) =
+  ## Free manager only; queue is inline and its `=destroy` runs when
+  ## the adapter object goes out of scope.
+  if a.manager != nil:
+    reset(a.manager[])
+    dealloc(a.manager)
+    a.manager = nil

Evidence
Both unbounded adapters free a.manager in cleanup() but do not destroy/reset the inline queue
first. The underlying UnboundedMupmuc/UnboundedSipmuc types store the manager pointer inside the
queue object (manager: ptr DebraManager[...]), bind a client refcount on construction
(bindClient(manager[])), and later unbind during =destroy via unbindClient(self.manager[]), so
the manager must outlive the queue instance.
benchmarks/nim/adapters/lockfreequeues_unbounded_mupmuc_adapter.nim[43-51]
benchmarks/nim/adapters/lockfreequeues_unbounded_sipmuc_adapter.nim[43-55]
src/lockfreequeues/unbounded_mupmuc.nim[72-89]
src/lockfreequeues/unbounded_mupmuc.nim[144-146]
src/lockfreequeues/unbounded_mupmuc.nim[436-465]
src/lockfreequeues/unbounded_sipmuc.nim[75-92]
src/lockfreequeues/unbounded_sipmuc.nim[140-142]
src/lockfreequeues/unbounded_sipmuc.nim[376-405]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The unbounded adapters' `cleanup()` frees the `DebraManager` while the inline `queue` still holds a pointer to it. The queue destructor later unbinds from `self.manager[]`, which will be freed, causing assertions or UAF.
### Issue Context
`UnboundedMupmuc`/`UnboundedSipmuc` call `bindClient(manager[])` in their constructors and `unbindClient(self.manager[])` in `=destroy`, so the manager must remain valid until after the queue is destroyed/reset.
### Fix Focus Areas
- benchmarks/nim/adapters/lockfreequeues_unbounded_mupmuc_adapter.nim[43-51]
- benchmarks/nim/adapters/lockfreequeues_unbounded_sipmuc_adapter.nim[43-55]
### Recommended fix
Update `cleanup()` to destroy/reset the inline queue *before* resetting/deallocating the manager, e.g.:
- call `reset(a.queue)` (or otherwise invoke the queue's destructor) while `a.manager` is still valid
- then `reset(a.manager[])` and `dealloc(a.manager)`
Alternative (often simpler): store the manager inline in the adapter and pass `addr a.manager` into `newUnbounded*`, so normal field-destruction order naturally keeps the manager alive until after the queue is destroyed.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Non-finite BMF emission 🐞 Bug ≡ Correctness

Description

bench_throughput emits BMF measures from mean/stddev without checking finiteness, but its
throughput calculations still divide by elapsedNs without guarding against elapsedNs <= 0. If
any run yields inf/NaN, merge_bmf.py will exit 1 due to schema validation and the CI bench
workflow will fail.

Code

benchmarks/nim/bench_throughput.nim[R553-566]

+  # BMF accumulator. Always allocate (cheap) so the per-variant
+  # `addMeasure` calls do not need to be guarded; emit only at end if
+  # `--bmf-out` was given.
+  var emitter = initBMFEmitter()
+
+  proc emitThroughputSlug(
+      em: var BMFEmitter, slug: string, mean, stddev: float
+  ) =
+    ## Map legacy `results.ThroughputMetrics{mean, stddev}` onto the BMF
+    ## triple `{value, lower_value, upper_value}` per design 2.3:
+    ## value=mean, bounds = mean +/- 1*stddev (1-sigma band; matches the
+    ## convention used by `bmf_adapter.py` which this code replaces).
+    em.addMeasure(slug, "throughput_ops_ms", mean, mean - stddev, mean + stddev)
+

Evidence

The new BMF emission path unconditionally calls addMeasure(...) with the computed mean and
bounds. The legacy throughput benchmark code computes ops/ms via messageCount / elapsedNs with no
elapsedNs <= 0 guard, so a zero-duration measurement can produce non-finite values. The new
merge_bmf.py step explicitly rejects any non-finite float in value/lower_value/upper_value,
while the new shared harness (bench_common) includes an elapsedNs <= 0 guard—highlighting the
missing protection in bench_throughput.

benchmarks/nim/bench_throughput.nim[553-566]
benchmarks/nim/bench_throughput.nim[139-140]
benchmarks/merge_bmf.py[51-80]
benchmarks/nim/bench_common.nim[548-560]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`bench_throughput` now emits BMF directly, but it does not guard against non-finite numbers. If a run produces `elapsedNs <= 0` (or any other source of non-finite mean/stddev), the emitted JSON will contain non-finite values and `merge_bmf.py` will fail CI.
### Issue Context
- `merge_bmf.py` enforces `math.isfinite` for all numeric fields.
- `bench_common.runOneThroughputRun` already contains an `elapsedNs <= 0` guard, but the legacy `bench_throughput` benchmark functions do not.
### Fix Focus Areas
- benchmarks/nim/bench_throughput.nim[553-566]
- benchmarks/nim/bench_throughput.nim[139-140]
- benchmarks/merge_bmf.py[51-80]
### Recommended fix
Do one (or both) of:
1) Add `elapsedNs <= 0.0` guards in the legacy run procs (`runThroughputBenchmark`, `runMupmucBenchmark`, `runUnboundedMupsicBenchmark`) returning `0.0` (or skipping the sample) to prevent `inf`.
2) In `emitThroughputSlug` (or `addMeasure`), check `mean` and bounds are finite before recording; if `mean` is non-finite, skip emitting that `(slug, measure)` entirely so the rest of the run still uploads successfully.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ You are approaching your monthly quota for Qodo. Upgrade your plan

Replaces the PR 5 placeholder with the real uPlot block per design §4.5 schema. Documents source, version (1.6.27), MIT license, vendored path, and a precise upgrade procedure (jsdelivr URL, SHA-256 verification via jsdelivr's package metadata). `.gitattributes` gains two rules per design §4.6: - `docs/assets/uplot-*.js` — linguist-vendored + linguist-generated. Confirmed via `git check-attr -a` against the vendored bundle. - `docs/assets/bench-results/*.json` — linguist-generated only. CI-emitted snapshots are not third-party imports, so the vendored flag would mislead.

Replaces the auto-rendered four-row table with a hand-curated four-row summary covering the four bounded lockfreequeues variants at one representative shape each, followed by a link to the live chart page (design §4.4 / Task 5.8). Initial cells contain `_to be filled at next release_` placeholders; the next release PR fills them in per the "Updating the README summary" procedure documented in benchmarks/README.md. Why hand-curated: - The chart page already absorbs run-to-run noise; the README does not need to. - Four cells, refreshed once per release cycle, is faster to update by hand than to re-introduce an auto-renderer the chart page has superseded. - The README is the first artefact consumers see; a noisy CI run should not silently regress its headline numbers. Pre-deletion release-tag check (per impl plan 5.8): - v3.2.0 and v4.0.0 each ship `benchmarks/render_readme.nim` in their tagged tree. Deleting the file on devel does NOT mutate those tags (git tags point at frozen commits). - No CI workflow, nimble task, or test runner references the renderer; the test suite (`tests/test.nim`) does not import `t_render_readme`. Confirmed via `git grep` against `*.yml`, `*.nimble`, `*.cfg`, and `tests/test.nim`. Files removed: - `benchmarks/render_readme.nim` — auto-rendered the README markers. - `tests/t_render_readme.nim` — `include`s the renderer; would not compile after deletion. `benchmarks/README.md` gains an "Updating the README summary" section codifying the hand-curation procedure (which shapes to read, where to read them from, when to commit). The chart-contract test runner is added to the test commands list. Two remaining `render_readme` mentions are intentional: - `CHANGELOG.md` PR-0 entry (historical record of the PR-0 rewrite). - `benchmarks/README.md` explanatory note about the deletion path.

Adds Track 5 entries to [Unreleased]: chart wiring + assets, BMF snapshot publishing pipeline with three-layer loop-prevention, devel-triggered docs deploy + post-deploy asset verification, THIRD_PARTY_LICENSES + .gitattributes uPlot entries, the new chart-contract Python test, and the README BENCHMARKS markers switching from auto-rendered to hand-curated. Documents removal of `benchmarks/render_readme.nim` and its test, with the pre-deletion release-tag check noted. Manual chart performance evidence (LCP/FCP screenshot per design §3 PR 5 acceptance) will be attached to the PR description after push, since it requires a live mike deploy URL.

…y old uPlot Per Gemini review on PR #24. 1. `docs/benchmarks.md` and `docs/assets/bench-charts.js`: `./assets/` resolves to `/<version>/benchmarks/assets/` under mkdocs `use_directory_urls` (default true), which 404s — assets actually live at `/<version>/assets/`. Switch to `../assets/` so the chart page reaches them via one level up. 2. `.github/workflows/docs.yml` mike-asset verifier: same path bug — was probing `/dev/benchmarks/assets/bench-results/latest.json` which would have returned 404 once the snapshot landed (the verify step only runs on devel push, so the bug wouldn't surface until then). Aligned with the page's `../assets/` path. 3. `bench-charts.js` ResizeObserver leak on log-scale toggle: `attachResizeObserver` was called from inside `rebuild()` which fires on every toggle. Each click left a stranded observer + a stale `plot` reference its setSize closure was still firing on. Refactored to a single observer attached after the first build, which resolves `plot` lazily through a getter — toggle just swaps the underlying instance. 4. `bench-charts.js` orphaned uPlot DOM/listeners: `rebuild()` mounted a new uPlot without destroying the previous one. Added `if (plot) plot.destroy()` at the top of `rebuild()`. Combined with the observer fix above, the toggle is now state-clean across arbitrarily many flips. Syntax-checked bench-charts.js with `node --check`.

Three code-review fixes on PR 5: 1. Snapshot push race. The bench-upload Snapshot step pushed devel without rebasing, so a concurrent push during a long bench run would reject non-fast-forward and fail the entire job. Add a fetch + rebase + retry loop (3 attempts) and push HEAD:devel. The commit is already [skip ci], so the retry cannot trigger an infinite bench loop on a successful push. 2. Hardcoded GitHub Pages verify URL. The docs.yml verification step curled https://elijahr.github.io/lockfreequeues/... which is wrong on forks or after a repo rename. Derive the URL from github.repository_owner and github.event.repository.name so the verifier follows the repo wherever it lives. 3. Log-scale accepts non-positive. The chart defaults to log Y (distr: 3) but merge_bmf.py only enforces finiteness, so a malformed BMF could deliver a 0 or negative throughput value. Treat <= 0 as missing data in toUplotData. Throughput in ops/ms cannot legitimately be non-positive (a 0 indicates a degenerate run that bench_throughput's elapsedNs guard already rejects), so surfacing as missing is the correct semantics.

The PR4 -> PR5 rebase merged two permissions blocks: PR4's contents:read (needed by actions/checkout + actions/download-artifact when default GITHUB_TOKEN is read-only) and PR5's contents:write (needed by the snapshot-push step). YAML rejects duplicate keys, so GitHub Actions failed the workflow at parse time before any job started — visible as a workflow-level failure with no job rows. Collapse to contents:write, which is a superset of read.

- Vendor `uPlot.min.css` 1.6.27 alongside the existing IIFE JS bundle. Without the companion stylesheet, uPlot's axes, grid lines, and cursor are misaligned (uPlot ships the layout rules in CSS, not inline). Wire it via a `<link>` in benchmarks.md AND add it to mkdocs `extra_css` so it loads on every page (cheap, ~2KB) and the cursor/grid layout works regardless of which page renders the chart. - Update .gitattributes vendoring rules to cover the .css file too. - Update THIRD_PARTY_LICENSES upgrade procedure for the dual bundle. - bench-charts.js: replace per-comparison regex+parseInt with a pre-parse-and-sort pass that computes (p, c, total) once per label. Non-matching labels keep a sentinel that puts them ahead of matched shapes, ordered lexicographically among themselves. - test_bench_charts_contract: replace the manual `v == v and v not in (inf, -inf)` finite check with `math.isfinite(v)`.

…link - Series key is now `<library> (<topology>)` instead of bare library name. A library that exposes multiple topologies (e.g. mupmuc/spsc AND mupmuc/mpmc) used to merge into one series and silently overwrite shape values. - Cache `_shapeMap` on each library's first hover so the setCursor hook does an O(1) Map lookup per visible series instead of an O(N) linear `find` on every mousemove. - Drop the redundant `<link rel="stylesheet">` for uplot's CSS in docs/benchmarks.md — it's already loaded globally via mkdocs.yml's `extra_css`. The duplicate caused two network requests and risked load-order conflicts. - Document the contract-test version-string bump in the uPlot upgrade procedure, and clarify that the CSS comes via `extra_css` (not a `<link>`) so future upgraders touch the right config. The duplicate-network-request risk and the silent-overwrite bug were the user-facing failures; the Map cache is a small ergonomic win on the same code path.

…JS parseSlug - bench.yml: remove [skip ci] from the snapshot commit. The marker also suppressed docs.yml, which has no paths-ignore and must run on snapshot pushes to deploy fresh chart data to Pages. Loop prevention is unaffected: bench.yml's existing paths-ignore on docs/assets/bench-results/** already short-circuits the bench job on the snapshot commit, and the bot-actor guard on bench-upload is the defense-in-depth layer. Comment block updated to reflect the two-layer scheme (was three layers). - test_bench_charts_contract: SLUG_RE accepted only 3-segment slugs; bench-charts.js parseSlug accepts 3+ via slice(1, -1).join('/'). Loosen the python regex to match so the contract test does not reject nested-topology slugs the JS chart would render.

Track 6 (PR 6, final track in the bench-rollup feature stack) adds latency-regression gating in Bencher and the supporting BMF measures. bench_latency.nim now emits the full p50/p95/p99/p999/max latency tuple per bounded variant slug. Each call to runLatencyHarness adds two new measures alongside the existing trio: em.addMeasure(slug, "latency_p999_ns", metrics.p999_ns) em.addMeasure(slug, "latency_max_ns", metrics.max_ns) The merged BMF on each bench job thus carries a complete tail-latency profile per (library, topology) cell, available to the Bencher dashboard and any downstream comparison chart. bench_common.nim raises HistogramTopK from 1000 to 5000. At production sample counts (BenchLatencyMessageCount=100_000 x BenchLatencyRuns=33, seenAll ~ 3.3M per aggregated histogram) the p999 tail rank is ~3300; K=1000 forced the p999 lookup into the rescaled-reservoir stratum, while K=5000 keeps it in the exact top-K stratum with headroom for stochastic boundary variation. Memory cost is 5000 x 8B = 40KB additional per histogram, negligible relative to the 99K-sample reservoir. bench.yml's base-branch tracking step (push to devel/main) configures per-measure thresholds in a single bencher run invocation: latency_p99_ns with --threshold-upper-boundary 0.99 (regression = latency increase) and throughput_ops_ms with --threshold-lower-boundary 0.99 (regression = throughput drop). Both share --threshold-test t_test --threshold-max-sample-size 64, terminated by --thresholds-reset so only the explicitly-listed thresholds remain active. This also corrects a prior measure-name mismatch: the earlier --threshold-measure throughput never matched any emitted measure (the actual key is throughput_ops_ms), so the previous throughput threshold was a no-op. Threshold activation requires >= 10 prior runs accumulated in Bencher to calibrate the t-test baseline (Task 6.4 stability soak gate). Bencher does not emit alerts on measures with insufficient sample history, so the configuration is dormant until the soak completes post-merge. Tests: - t_bench_common.nim: HistogramTopK >= 5000 assertion + p999 within 5% of sort fallback on 3.3M log-normal samples (27 tests, all passing). - t_bench_latency.nim: every bounded variant slug carries latency_p999_ns + latency_max_ns with monotone tail ordering p99 <= p999 <= max (7 tests, all passing). - tests/test.nim main suite unaffected (200 tests, all passing). Refs design doc /Users/eek/.local/spellbook/docs/Users-eek-Development-lockfreequeues/plans/2026-05-01-bench-rollup-design.md section 3 PR 6 and impl plan /Users/eek/.local/spellbook/docs/Users-eek-Development-lockfreequeues/plans/2026-05-01-bench-rollup-impl.md Track 6 (Tasks 6.1-6.5).

…ntiles Per Gemini review on PR #25. The Track 6 Task 6.2 commentary in HistogramTopK, bench_latency.nim, t_bench_common.nim, and CHANGELOG repeatedly claimed that runLatencyHarness aggregates samples across runs ("seenAll ≈ 3.3M per aggregated histogram"). It does not — design 2.5 explicitly says "mean across runs of each percentile", so each Histogram only ever sees BenchLatencyMessageCount samples (default 100K), not MessageCount × Runs. At the default 100K MessageCount the previous K=1000 was already adequate — TopK + Reservoir already captured every sample exactly (1000 + 99,000 ≥ 100,000), so K=1000 was never spilling p999 into the rescaled stratum at default settings. Reframed all four sites: the K=5000 sizing is **anticipatory** for operators who override BenchLatencyMessageCount upward (~5M-sample overrides), keeping p999 (tail rank = MessageCount × 0.001) inside the exact top-K stratum at that volume. The 3.3M-sample stress test in t_bench_common.nim is reframed as a stress shape that exercises the K=5000 design choice, not a depiction of production volume. No code logic changed; comments + CHANGELOG only.

Code review on PR 6 noted that the new HistogramTopK assertion in tests/t_bench_common.nim and the latency CLI assertions in tests/t_bench_latency.nim were not imported by tests/test.nim and not run by any CI step, so the validation added by this PR was not actually enforced in CI. Wire up via nimble's separation-of-concerns pattern (Option B from the review): - lockfreequeues.nimble gains a 'benchtests' task that compiles and runs t_bench_common, t_bench_latency, and t_bench_adapters under one Nim invocation each. The bench harness tests live outside srcDir; importing them from tests/test.nim would force every entry of the 8-invocation MM/sanitizer test matrix to compile bench_common's threading machinery. A dedicated task keeps the regular nimble test matrix lean while still enforcing the bench-side assertions. - A second 'benchteststress' task enables the gated 3.3M-sample HistogramTopK p999 stress shape (slow, ~10-15s release-mode), so default benchtests stays under a second. - bench.yml gains a 'bench-tests' job (ubuntu-latest) that runs 'nimble benchtests' once per workflow trigger, with the same fork-PR / bot-actor guards as the bench matrix jobs. The 3.3M-sample stress test in t_bench_common is now gated behind -d:BenchCommonStress so it only runs under benchteststress, not the default benchtests run.

The previous wording called `percentile(1.0)` "the top-K head", which is misleading for a min-heap implementation: `topKHeap[0]` is the SMALLEST element in the top-K (the admission threshold for new outliers), not the largest. The implementation snapshots the heap into a sequence, sorts it, and returns `topk[^1]` — the maximum. Comment-only change; behaviour unchanged.

Replaces the Qodo PR-Agent reusable workflow with Momus (axiomantic/momus), an open-source from-scratch PR review bot.

…KEY -> LLM_API_KEY, pass MOMUS_APP_* secrets, set trigger_mention

axiomantic-momus

PR 0 of the bench-rollup feature: introduces a shared bench harness module (bench_common.nim), rewires the CI bench workflow to use native BMF JSON emission via --bmf-out, adds 5 new lockfreequeues adapters, renames 3 existing adapters, introduces merge_bmf.py for stateless BMF union, and rewrites render_readme.nim for the new BMF shape. Two substantive issues found: the per-variant warmup defaults silently dropped from 3 to 0 for bounded variants (contradicts the 'pure refactor' claim and affects benchmark numbers), and 14 new tests across t_render_readme.nim and test_merge_bmf.py are not wired into any CI workflow.

Severity tally: 2 Mediums, 1 Low.

Medium (blocking)

BOT-A1 (benchmarks/nim/bench_throughput.nim:521): Per-variant warmup defaults silently dropped from 3 to 0 for bounded variants
BOT-A2 (lockfreequeues.nimble:65): 14 new tests not wired into CI; merge_bmf and render_readme tests provide no gating

Low

BOT-A3 (tests/t_bench_common.nim:256): SmokeAdapter leaks its Channel in throughput and latency smoke tests

Noteworthy

The BMFEmitter + Histogram implementation is careful about edge cases (NaN/Inf guard in addMeasure, zero-duration guard in elapsed time calculations, deterministic reservoir seed for reproducibility).
The merge_bmf.py collision detection and schema validation (slug regex, measure regex, NaN rejection, bool rejection) is thorough and well-tested.
The CI bench.yml now pins sibling deps to tagged releases (v0.6.0, v0.7.0) instead of main, improving CI determinism.

Verdict: REQUEST_CHANGES.

Commands

Comment /ai-review or mention @axiomantic-momus[bot] to request a re-review of the latest changes.
Reply to a finding with won't fix, by design, or not a bug to decline it.
Reply with instead, ... to propose an alternative fix.

Powered by Momus running deepseek/deepseek-v4-pro via openrouter.ai.

axiomantic-momus · 2026-05-05T04:03:18Z

+    BenchSipsicWarmup {.intdefine.} = 0
+    BenchMupmucRuns {.intdefine.} = 10
+    BenchMupmucWarmup {.intdefine.} = 0
+    BenchChannelsRuns {.intdefine.} = 10
+    BenchChannelsWarmup {.intdefine.} = 0


BOT-A1 — Medium (bug)
Per-variant warmup defaults silently dropped from 3 to 0 for bounded variants
The new per-variant compile-time constants BenchSipsicWarmup, BenchMupmucWarmup, and BenchChannelsWarmup all default to 0 (lines 521-525). The old code called benchmarkThroughput/benchmarkMupmucNP*C without an explicit warmup argument, picking up the default WarmupRuns=3 (line 44). In CI the old behavior was warmup=2 (via -d:WarmupRuns=2). The PR description states 'This is a pure refactor PR. No new variants are benched, no regression vs the prior CI numbers is expected' and claims the binary is 'bit-for-bit unchanged from the prior release' when --bmf-out is absent. The warmup change contradicts both claims: the first timed runs will absorb cold-cache/JIT effects, producing systematically lower throughput numbers across all bounded variants (sipsic, mupmuc, channels) on every local and CI invocation. The unbounded_mupsic variants are unaffected — they still use WarmupRuns=3 (line 465).

Suggested change

BenchSipsicWarmup {.intdefine.} = 0

BenchMupmucRuns {.intdefine.} = 10

BenchMupmucWarmup {.intdefine.} = 0

BenchChannelsRuns {.intdefine.} = 10

BenchChannelsWarmup {.intdefine.} = 0

Either set the per-variant warmup defaults to match the old behavior (BenchSipsicWarmup=3, BenchMupmucWarmup=3, BenchChannelsWarmup=3), or if the change is intentional, update the PR description and CHANGELOG to document it as a Changed behavior. Also ensure the CI bench.yml -d:WarmupRuns flag actually reaches these variants (currently it doesn't, since the isMainModule calls pass the per-variant constants explicitly).

axiomantic-momus · 2026-05-05T04:03:19Z

+task benchtests, "Runs the bench harness test suite":
+  # The bench harness lives outside `srcDir`, so its dedicated tests
+  # (`tests/t_bench_*.nim`) are NOT imported by `tests/test.nim` to
+  # keep the regular `nimble test` matrix free of the bench harness's


BOT-A2 — Medium (tests)
14 new tests not wired into CI; merge_bmf and render_readme tests provide no gating
The PR adds 4 tests in tests/t_render_readme.nim and 10 tests in benchmarks/tests/test_merge_bmf.py, but neither file is referenced in any CI workflow or nimble task. The benchtests nimble task (line 65) only runs tests/t_bench_common.nim. The bench.yml bench-tests job only invokes nimble benchtests. The Python tests require python3 -m unittest benchmarks.tests.test_merge_bmf -v and the Nim tests require a separate compile+run of t_render_readme.nim — neither is automated. These tests cover schema validation, collision detection, slug parsing, BMF shape parsing, and edge cases (NaN rejection, missing files). Without CI gating, regressions in merge_bmf.py or render_readme.nim won't be caught pre-merge.

Suggested change

# keep the regular `nimble test` matrix free of the bench harness's

Add a steps to the bench.yml `bench-tests` job to run `python3 -m unittest benchmarks.tests.test_merge_bmf -v` and to compile+run `tests/t_render_readme.nim`. Alternatively, extend the `benchtests` nimble task to include these invocations.

axiomantic-momus · 2026-05-05T04:03:19Z

+      queueInit = proc(cap: int): SmokeAdapter = initSmokeAdapter(cap),
+      capacity = 1024,
+      numProducers = 1,
+      numConsumers = 1,
+      messageCount = 1000,
+      runCount = 1,
+      warmupCount = 0,
+    )
+    check metrics.runs == 1
+    check metrics.ops_ms_mean > 0.0
+
+# ---------- Task 0.6: runLatencyHarness smoke ----------
+
+suite "bench_common runLatencyHarness":
+  test "smoke: 1P/1C, 1000 messages, 1 run, 0 warmup; p50<p99<max":
+    let metrics = runLatencyHarness[SmokeAdapter](
+      queueInit = proc(): SmokeAdapter = initSmokeAdapter(1024),


BOT-A3 — Low (quality)
SmokeAdapter leaks its Channel in throughput and latency smoke tests
The SmokeAdapter type (lines 256-272) allocates a Channel[uint64] via create in initSmokeAdapter but defines no cleanup proc. The bench_common harness (both runOneThroughputRun and runOneLatencyRun) uses when compiles(cleanup(queue)) to conditionally call cleanup, which never fires for SmokeAdapter. Every invocation of the runThroughputHarness and runLatencyHarness smoke tests leaks a Channel. While tests are short-lived and the OS reclaims the memory at process exit, this is a leak that could mask real issues if these smoke adapters are copied into production adapter code.

Suggested change

queueInit = proc(cap: int): SmokeAdapter = initSmokeAdapter(cap),

capacity = 1024,

numProducers = 1,

numConsumers = 1,

messageCount = 1000,

runCount = 1,

warmupCount = 0,

)

check metrics.runs == 1

check metrics.ops_ms_mean > 0.0

# ---------- Task 0.6: runLatencyHarness smoke ----------

suite "bench_common runLatencyHarness":

test "smoke: 1P/1C, 1000 messages, 1 run, 0 warmup; p50<p99<max":

let metrics = runLatencyHarness[SmokeAdapter](

queueInit = proc(): SmokeAdapter = initSmokeAdapter(1024),

Add a `cleanup` proc to SmokeAdapter that closes and deallocates the channel, matching the pattern in channels_adapter.nim.

elijahr · 2026-05-05T05:48:42Z

/ai-review

axiomantic-momus

This omnibus PR consolidates 7 stacked PRs for the bench-rollup feature stack (81 commits, 70 files, +11,296/-1,493). It adds a shared bench harness with native BMF emission, topology-split binaries, comparison adapters for 7 external queue libraries, an interactive uPlot chart, CI matrix with per-step timeouts, cache-line-aligned segment allocation, and Bencher threshold gating. The code is thorough, well-tested, and the CI pipeline is carefully designed with soft-skip patterns and loop-prevention guards. Only one minor nit found.

Severity tally: 1 Nit.

Nit

BOT-B1 (tests/t_aligned_alloc.nim:10): Test file uses raw c_free instead of freeAligned from the module under test.

Noteworthy

Cache-line-aligned segment allocation (allocAligned/freeAligned with POSIX posix_memalign and Windows _aligned_malloc paths) is a genuine fix for false sharing in unbounded queues, verified by 8 dedicated assertions in t_unbounded_padding.nim.
The soft-skip CI pattern (install → smoke → set-flag-on-success → annotate-on-failure) gracefully handles unavailable external libraries without failing the workflow, keeping partial bencher coverage for surviving binaries.
Deletion-safety guard (superset_check.py) run in CI between merge and upload ensures the topology split never silently drops slugs — a robust regression prevention pattern.

Verdict: APPROVE.

Commands

Comment /ai-review or mention @axiomantic-momus[bot] to request a re-review of the latest changes.
Reply to a finding with won't fix, by design, or not a bug to decline it.
Reply with instead, ... to propose an alternative fix.

Powered by Momus running deepseek/deepseek-v4-pro via openrouter.ai.

axiomantic-momus · 2026-05-05T06:15:14Z

+from system/ansi_c import c_free
+
+type
+  Tiny = object
+    a: int
+
+  Line = object
+    a: int
+    b: int
+    c: int
+    d: int
+    e: int
+    f: int
+    g: int
+    h: int
+
+  Big = object
+    a: array[256, byte]
+
+suite "internal/aligned_alloc.allocAligned":
+  test "tiny payload (sizeof < CacheLineBytes) is 64B-aligned":
+    let p = allocAligned[Tiny]()
+    check p != nil
+    check (cast[uint](p) mod CacheLineBytes.uint) == 0
+    check p.a == 0 # zero-initialized
+    c_free(p)
+
+  test "line-sized payload (sizeof == CacheLineBytes) is 64B-aligned":
+    let p = allocAligned[Line]()
+    check p != nil
+    check (cast[uint](p) mod CacheLineBytes.uint) == 0
+    check p.a == 0
+    check p.h == 0
+    c_free(p)
+
+  test "big payload (sizeof > CacheLineBytes) is 64B-aligned":
+    let p = allocAligned[Big]()
+    check p != nil
+    check (cast[uint](p) mod CacheLineBytes.uint) == 0
+    for i in 0 ..< 256:
+      check p.a[i] == 0.byte
+    c_free(p)
+
+  test "many allocations are all 64B-aligned":
+    var ptrs: array[64, ptr Line]
+    for i in 0 ..< 64:
+      ptrs[i] = allocAligned[Line]()
+      check ptrs[i] != nil
+      check (cast[uint](ptrs[i]) mod CacheLineBytes.uint) == 0
+    for i in 0 ..< 64:
+      c_free(ptrs[i])


BOT-B1 — Nit (tests)
Test file uses raw c_free instead of freeAligned from the module under test.
tests/t_aligned_alloc.nim imports lockfreequeues/internal/aligned_alloc (which exports freeAligned), but deallocates test allocations via system/ansi_c.c_free instead of using the module's own freeAligned. On POSIX (the only CI platform) freeAligned delegates to c_free, so tests pass, but the freeAligned code path added in this PR is left untested. On Windows the test would be incorrect (_aligned_free is required).

Suggested change

from system/ansi_c import c_free

type

Tiny = object

a: int

Line = object

a: int

b: int

c: int

d: int

e: int

f: int

g: int

h: int

Big = object

a: array[256, byte]

suite "internal/aligned_alloc.allocAligned":

test "tiny payload (sizeof < CacheLineBytes) is 64B-aligned":

let p = allocAligned[Tiny]()

check p != nil

check (cast[uint](p) mod CacheLineBytes.uint) == 0

check p.a == 0 # zero-initialized

c_free(p)

test "line-sized payload (sizeof == CacheLineBytes) is 64B-aligned":

let p = allocAligned[Line]()

check p != nil

check (cast[uint](p) mod CacheLineBytes.uint) == 0

check p.a == 0

check p.h == 0

c_free(p)

test "big payload (sizeof > CacheLineBytes) is 64B-aligned":

let p = allocAligned[Big]()

check p != nil

check (cast[uint](p) mod CacheLineBytes.uint) == 0

for i in 0 ..< 256:

check p.a[i] == 0.byte

c_free(p)

test "many allocations are all 64B-aligned":

var ptrs: array[64, ptr Line]

for i in 0 ..< 64:

ptrs[i] = allocAligned[Line]()

check ptrs[i] != nil

check (cast[uint](ptrs[i]) mod CacheLineBytes.uint) == 0

for i in 0 ..< 64:

c_free(ptrs[i])

Replace all `c_free(p)` / `c_free(ptrs[i])` calls with `freeAligned(p)` / `freeAligned(ptrs[i])` and drop the `from system/ansi_c import c_free` import.

Drops c_free in favor of the module's freeAligned helper so the test exercises the public API symmetrically. Same underlying allocator (c_free under POSIX), but freeAligned is the documented inverse of allocAligned and the right call when _aligned_free is needed on Windows.

bench-rollup feature stack ships as 4.2.0. Version bumped from 4.1.0 (already-released; tagged v4.1.0 on 2026-05-01) and the previously [Unreleased] CHANGELOG section moved under the [4.2.0] header dated 2026-05-05. release.yml triggers on push to devel: it parses this version, creates v4.2.0 tag if absent, and publishes the GitHub Release with auto-generated notes from commit history.

This reverts commit eb56a15.

Conflict resolutions: - src/lockfreequeues/unbounded_{mupmuc,mupsic,sipmuc}.nim: kept HEAD's allocAligned[DebraManager[MaxThreads]]() (raises OutOfMemDefect internally per aligned_alloc.nim) and dropped devel's c_calloc + nil check. The cache-line padding work (PR 19) supersedes c_calloc here. - mkdocs.yml: took devel's 'Concepts: Safety Model / Slot Ownership Typestates' nav block. devel deleted docs/api/ops.md, so HEAD's 'Support: Operations' entry would break the docs build. - CHANGELOG.md: combined both Unreleased sections. Promoted devel's CASAttempt typestate restructure to a top-level BREAKING entry; appended devel's Changed/Added/Fixed under '(typestates 0.7 uplift)' subheaders so the bench-rollup and typestates-uplift sections are visually separable in the next release notes. Verified: 210/210 tests pass under nim c --threads:on tests/test.nim with typestates 0.7.2 + this merge.

The merge from devel surfaced that this branch's build.yml + bench.yml still pinned nim-typestates@v0.7.0 / nim-debra@v0.6.0. lockfreequeues now requires typestates >= 0.7.2 (CASAttempt union, match-macro generic-context fixes) and debra >= 0.7.0 (no published tag yet); v0.7.0 fails compilation with 'match arm head must be a single state identifier' on the new opaque-state typestate match invocations. Match devel's approach: track --branch main until nimble.directory picks up the new tags. Updates 3 clone blocks (build.yml + 2 in bench.yml) plus the now-stale 'pin to release tags' rationale comments. Verified locally with main of nim-typestates (= v0.7.2 sha db7159f) and nim-debra: 210/210 tests pass.

axiomantic-momus

This omnibus PR consolidates 7 stacked PRs (Tracks 0–6) into a single bench-rollup: shared harness with native BMF emission, topology-split binaries, comparison adapters (Crossbeam, Boost, Loony, MoodyCamel, threading.Channels, system.Channel), interactive uPlot charts, latency thresholds, and snapshot publishing. One confirmed bug found: the bench-comparison.yml workflow carries a stale threshold measure name that was already fixed in bench.yml's Track 6, making the Crossbeam Bencher threshold silently inoperative.

Severity tally: 1 Low.

Low

BOT-A1 (.github/workflows/bench-comparison.yml:196): Bencher threshold measure name mismatch in bench-comparison.yml: throughput vs throughput_ops_ms

Noteworthy

Well-structured soft-skip adapter pattern: install → smoke → set-flag-on-success → annotate-on-failure, with workflow_dispatch force-skip overrides. Graceful degradation without workflow failures.
Three-layer loop-prevention for snapshot publishing (paths-ignore, bot-actor guard, explicit ref checkout) is carefully reasoned and avoids the common [skip ci] / docs-deploy tension.
The Histogram top-K + Algorithm R reservoir stratified-percentile implementation in bench_common.nim is mathematically sound with proper rescaling for body-stratum lookup.

Verdict: APPROVE.

Commands

Comment /ai-review or mention @axiomantic-momus[bot] to request a re-review of the latest changes.
Reply to a finding with won't fix, by design, or not a bug to decline it.
Reply with instead, ... to propose an alternative fix.

Powered by Momus running deepseek/deepseek-v4-pro via openrouter.ai.

axiomantic-momus · 2026-05-05T08:41:40Z

+            --token '${{ secrets.BENCHER_API_TOKEN }}' \
+            --branch devel \
+            --testbed ubuntu-latest \
+            --threshold-measure throughput \


BOT-A1 — Low (bug)
Bencher threshold measure name mismatch in bench-comparison.yml: throughput vs throughput_ops_ms
The "Track scheduled benchmarks with Bencher" step in bench-comparison.yml uses --threshold-measure throughput but the actual measure key emitted by every bench binary (bench_spsc, bench_mpsc, bench_mpmc, bench_unbounded) is throughput_ops_ms. This is the same bug that bench.yml's Track 6 explicitly corrected (the Track 6 comment reads: "the prior config used --threshold-measure throughput; the actual measure key ... is throughput_ops_ms"). The bench-comparison.yml was not updated to match, so the Crossbeam threshold will never match any measure — rendering Bencher regression detection on the nightly crossbeam workflow a silent no-op. Note: this workflow runs on a nightly cron schedule and is not on the PR critical path, so this is flagged as low severity (follow-up fix) rather than blocking.

Suggested change

--threshold-measure throughput \

Replace `--threshold-measure throughput` with `--threshold-measure throughput_ops_ms` on line 196 of bench-comparison.yml, matching the fix already applied in bench.yml line 773.

elijahr · 2026-05-05T09:12:16Z

/ai-review

axiomantic-momus

This omnibus PR rolls up 7 tracks of work (81 commits, 70 files) adding a shared bench harness, comparison adapters (Boost, Loony, Crossbeam, MoodyCamel, threading.Chan, system.Channel), topology-split binaries, interactive uPlot charts, and latency p99/p999/p9999 thresholds. Two substantive issues found: a crossbeam threshold measure name mismatch that makes Bencher gating a no-op in the comparison workflow (carried over from a bug the main bench.yml already fixed), and an incorrect rescaling formula in a docstring comment.

Severity tally: 1 Medium, 1 Low.

Medium (blocking)

BOT-C1 (.github/workflows/bench-comparison.yml:196): Bencher threshold measure name mismatch in bench-comparison.yml: throughput vs throughput_ops_ms

Low

BOT-C2 (benchmarks/nim/bench_common.nim:341): Histogram percentile docstring formula does not match the code

Noteworthy

The bench_common.nim shared harness is well-structured with proper cleanup (when compiles(cleanup(...))) and thorough docstrings.
The Rust cdylib (bench-ffi-crossbeam) has proper null-pointer guards, panic safety, and comprehensive integration tests.
The Loony adapter's documented encoding constraint (v+1)<<3 is well-documented with range limits and debug asserts.

Verdict: REQUEST_CHANGES.

Commands

Comment /ai-review or mention @axiomantic-momus[bot] to request a re-review of the latest changes.
Reply to a finding with won't fix, by design, or not a bug to decline it.
Reply with instead, ... to propose an alternative fix.

Powered by Momus running deepseek/deepseek-v4-pro via openrouter.ai.

axiomantic-momus · 2026-05-05T09:36:56Z

+            --token '${{ secrets.BENCHER_API_TOKEN }}' \
+            --branch devel \
+            --testbed ubuntu-latest \
+            --threshold-measure throughput \


BOT-C1 — Medium (bug)
Bencher threshold measure name mismatch in bench-comparison.yml: throughput vs throughput_ops_ms
The "Track scheduled benchmarks with Bencher" step in bench-comparison.yml uses --threshold-measure throughput but the actual measure key emitted by every bench binary (bench_spsc, bench_mpsc, bench_mpmc, bench_unbounded) is throughput_ops_ms. This is the same bug that bench.yml's Track 6 explicitly corrected (the Track 6 comment reads: "the earlier --threshold-measure throughput never matched any emitted measure (the actual key is throughput_ops_ms), so the previous throughput threshold was a no-op"). The bench-comparison.yml was missed when bench.yml was fixed. The threshold-gating for crossbeam comparison will never fire, making the scheduled regression detection silently non-functional.

Suggested change

--threshold-measure throughput \

Change `--threshold-measure throughput` to `--threshold-measure throughput_ops_ms` at line 196 of bench-comparison.yml.

axiomantic-momus · 2026-05-05T09:36:56Z

+  let pBodyClamped = max(0.0, min(1.0, pBody))
+  return percentile(h.reservoir, pBodyClamped)
+


BOT-C2 — Low (quality)
Histogram percentile docstring formula does not match the code
The Histogram.percentile proc's docstring says the body-stratum rescaling is p_body = (p*seenAll - K) / (seenAll - K) (subtracting K from the numerator). The code does let pBody = (pc * float(h.seenAll)) / float(bodySize) (no K subtraction). The code is correct — the body stratum holds the seenAll - K smallest samples, so the target rank within the body is p * seenAll, not p * seenAll - K. The docstring formula is wrong. The same docstring appears in the Histogram.percentiles proc at line ~371.

Suggested change

let pBodyClamped = max(0.0, min(1.0, pBody))

return percentile(h.reservoir, pBodyClamped)

Update the docstring to match the code: `p_body = (p * seenAll) / (seenAll - K)`.

BOT-C1 (Medium): bench-comparison.yml's scheduled-run threshold step used `--threshold-measure throughput` but the BMF emitter writes the measure key as `throughput_ops_ms`. The threshold therefore matched no measure and was a silent no-op; gating on Crossbeam vs lockfreequeues regressions never fired. Same bug class as PR 6's main bench.yml fix; aligning the two workflows on the canonical key. BOT-C2 (Low): Histogram percentile docstring described the body-stratum rescale as `(p*seenAll - K) / (seenAll - K)`. The actual code computes `(p*seenAll) / (seenAll - K)` (without the K subtraction in the numerator), and the worked examples below the formula already use the correct ratio (e.g. 50_000/99_000 for p=0.50). Bringing the formula and the worked examples into agreement with the implementation; tidying a stale '99_000 - 0' artefact from an earlier draft.

axiomantic-momus

This omnibus PR consolidates 7 previously-reviewed PRs (Track 0–6) adding a shared bench harness, 9 comparison adapters, topology-split bench binaries, interactive uPlot charts, and a snapshot publishing pipeline. Overall the code is well-structured with strong defensive patterns. Found 2 non-blocking quality issues: bench-comparison.yml's nightly threshold reset may silently remove bench.yml's latency threshold configuration, and the bench-tests job clones sibling deps from main rather than the release tags advertised in the PR description.

Severity tally: 2 Lows.

Low

BOT-A2 (.github/workflows/bench-comparison.yml:189): Nightly bench-comparison bencher run may remove bench.yml's latency threshold.
BOT-A3 (.github/workflows/bench.yml:100): bench-tests job clones sibling Nim deps from main, contradicting PR description.

Noteworthy

Consistent soft-skip pattern across all comparison adapters with install/smoke/set-flag stages and force_skip workflow_dispatch inputs — well-designed for CI resilience.
Cache-line alignment fix for unbounded queue segments via posix_memalign shim is clean and well-documented with platform-conditional fallback for Windows.

Verdict: APPROVE.

Commands

Comment /ai-review or mention @axiomantic-momus[bot] to request a re-review of the latest changes.
Reply to a finding with won't fix, by design, or not a bug to decline it.
Reply with instead, ... to propose an alternative fix.

Powered by Momus running deepseek/deepseek-v4-pro via openrouter.ai.

axiomantic-momus · 2026-05-05T10:05:34Z

+        if: github.event_name == 'schedule' && env.BENCHER_API_TOKEN != ''
+        run: |
+          bencher run \
+            --project lockfreequeues \
+            --token '${{ secrets.BENCHER_API_TOKEN }}' \
+            --branch devel \
+            --testbed ubuntu-latest \
+            --threshold-measure throughput_ops_ms \
+            --threshold-test t_test \
+            --threshold-max-sample-size 64 \
+            --threshold-lower-boundary 0.99 \
+            --thresholds-reset \
+            --adapter json \
+            --file merged.json \
+            --github-actions '${{ secrets.GITHUB_TOKEN }}' \
+            --err
+
+      - name: Track devel-push benchmarks with Bencher


BOT-A2 — Low (quality)
Nightly bench-comparison bencher run may remove bench.yml's latency threshold.
bench-comparison.yml's scheduled (nightly cron) bencher run uploads to the same Bencher project (lockfreequeues) and branch (devel) as bench.yml's push-to-devel run, but it configures only --threshold-measure throughput_ops_ms --thresholds-reset. bench.yml configures BOTH throughput_ops_ms AND latency_p99_ns thresholds. Since bencher stores thresholds per project+branch and --thresholds-reset clears all prior thresholds, the nightly crossbeam run will remove the latency threshold that bench.yml configured on the previous push. The two workflows would fence on threshold configuration: each night, bench-comparison resets to throughput-only; each devel push, bench.yml restores both throughput and latency. During the window between a nightly and the next devel push, latency regression alerting is silently absent.

Suggested change

if: github.event_name == 'schedule' && env.BENCHER_API_TOKEN != ''

run: |

bencher run \

--project lockfreequeues \

--token '${{ secrets.BENCHER_API_TOKEN }}' \

--branch devel \

--testbed ubuntu-latest \

--threshold-measure throughput_ops_ms \

--threshold-test t_test \

--threshold-max-sample-size 64 \

--threshold-lower-boundary 0.99 \

--thresholds-reset \

--adapter json \

--file merged.json \

--github-actions '${{ secrets.GITHUB_TOKEN }}' \

--err

- name: Track devel-push benchmarks with Bencher

Either (a) add `--threshold-measure latency_p99_ns --threshold-test t_test --threshold-max-sample-size 64 --threshold-upper-boundary 0.99` to the bench-comparison nightly invocation so both thresholds survive, or (b) remove `--thresholds-reset` from bench-comparison and add `--threshold-measure throughput_ops_ms --threshold-test t_test --threshold-max-sample-size 64 --threshold-lower-boundary 0.99` so both workflows append rather than reset.

axiomantic-momus · 2026-05-05T10:05:34Z

+        uses: jiro4989/setup-nim-action@v2
+        with:
+          nim-version: 'stable'
+
+      - name: Install build deps (Linux)
+        run: |
+          sudo apt-get update -q -y
+          sudo apt-get -qq install -y clang
+
+      - name: Clone and install sibling Nim deps (nim-debra, nim-typestates)
+        run: |
+          set -e


BOT-A3 — Low (quality)
bench-tests job clones sibling Nim deps from main, contradicting PR description.
The PR description states the new bench-tests CI job is "pinned to v0.6.0 / v0.7.0 sibling deps", but the actual bench.yml code clones nim-debra and nim-typestates from --branch main (not tags). Meanwhile bench-comparison.yml correctly uses --branch v0.6.0 and --branch v0.7.0. Using main means any upstream commit to those repos can break CI non-deterministically. The code comment in the bench-tests job header itself says "Sibling Nim deps are pinned to release tags (not main)" which further contradicts the code's actual --branch main. The benchmark matrix job's parallel clone step has a comment that acknowledges the tradeoff ("Tracks main until nimble.directory picks up the relevant tags"), but the bench-tests job's own comment is misleading and the overall discrepancy between PR description, code comments, and code behavior is confusing for maintainers debugging CI breakage.

Suggested change

uses: jiro4989/setup-nim-action@v2

with:

nim-version: 'stable'

- name: Install build deps (Linux)

run: |

sudo apt-get update -q -y

sudo apt-get -qq install -y clang

- name: Clone and install sibling Nim deps (nim-debra, nim-typestates)

run: |

set -e

Either update the PR description and the bench-tests job's code comment to accurately reflect that the bench-tests job tracks `main`, or align the code with the description by pinning to `v0.6.0` / `v0.7.0` (as bench-comparison.yml already does).

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

elijahr added 14 commits May 1, 2026 22:47

elijahr force-pushed the feat/bench-rollup-pr0-bench-common branch from 6a882f7 to 6d725c6 Compare May 2, 2026 03:49

gemini-code-assist Bot reviewed May 2, 2026

View reviewed changes

elijahr added 15 commits May 4, 2026 22:51

Switch from PR-Agent to Momus

a699801

Replaces the Qodo PR-Agent reusable workflow with Momus (axiomantic/momus), an open-source from-scratch PR review bot.

Update momus workflow: rename pr-review.yml -> momus.yml, OPENROUTER_…

462db8b

…KEY -> LLM_API_KEY, pass MOMUS_APP_* secrets, set trigger_mention

axiomantic-momus Bot suggested changes May 5, 2026

View reviewed changes

elijahr changed the title ~~bench-rollup PR 0: shared bench harness + native BMF emission~~ bench-rollup: shared harness + comparison adapters + interactive charts May 5, 2026

axiomantic-momus Bot approved these changes May 5, 2026

View reviewed changes

elijahr added 5 commits May 5, 2026 01:57

Revert "chore(release): cut 4.2.0"

1850919

This reverts commit eb56a15.

axiomantic-momus Bot approved these changes May 5, 2026

View reviewed changes

axiomantic-momus Bot suggested changes May 5, 2026

View reviewed changes

axiomantic-momus Bot approved these changes May 5, 2026

View reviewed changes

elijahr merged commit 8820e1b into devel May 5, 2026
18 checks passed

	# keep the regular `nimble test` matrix free of the bench harness's
	Add a steps to the bench.yml `bench-tests` job to run `python3 -m unittest benchmarks.tests.test_merge_bmf -v` and to compile+run `tests/t_render_readme.nim`. Alternatively, extend the `benchtests` nimble task to include these invocations.

	--threshold-measure throughput \
	Replace `--threshold-measure throughput` with `--threshold-measure throughput_ops_ms` on line 196 of bench-comparison.yml, matching the fix already applied in bench.yml line 773.

	--threshold-measure throughput \
	Change `--threshold-measure throughput` to `--threshold-measure throughput_ops_ms` at line 196 of bench-comparison.yml.

		let pBodyClamped = max(0.0, min(1.0, pBody))
		return percentile(h.reservoir, pBodyClamped)

	let pBodyClamped = max(0.0, min(1.0, pBody))
	return percentile(h.reservoir, pBodyClamped)
	Update the docstring to match the code: `p_body = (p * seenAll) / (seenAll - K)`.

Conversation

elijahr commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What ships

Track 0 — Shared bench harness + native BMF emission

Track 1 — Latency wired to BMF, multi-measure-per-slug end-to-end

Track 2 — Topology split (5 binaries) + per-step CI timeouts

Track 3 — Comparison MVP: Loony, Boost.LockFree, Crossbeam adapters

Track 4 — Comparison expansion: Threading.Channels, Nim Channel, MoodyCamel

Track 5 — Interactive uPlot charts + mkdocs deploy from devel

Track 6 — Latency thresholds + p999 / max measures

Tests added

Verification

Cost gate context

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bencher Report

⚠️ WARNING: No Threshold found!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

elijahr commented May 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

elijahr commented May 2, 2026

Uh oh!

gemini-code-assist Bot commented May 2, 2026

Uh oh!

elijahr commented May 2, 2026

Uh oh!

qodo-code-review Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Previous review results

Uh oh!

axiomantic-momus Bot left a comment

Choose a reason for hiding this comment

Medium (blocking)

Low

Noteworthy

Uh oh!

axiomantic-momus Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

axiomantic-momus Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

axiomantic-momus Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

elijahr commented May 1, 2026 •

edited

Loading

github-actions Bot commented May 1, 2026 •

edited

Loading

qodo-code-review Bot commented May 2, 2026 •

edited

Loading