Skip to content

bench-rollup: shared harness + comparison adapters + interactive charts#19

Merged
elijahr merged 110 commits intodevelfrom
feat/bench-rollup-pr0-bench-common
May 5, 2026
Merged

bench-rollup: shared harness + comparison adapters + interactive charts#19
elijahr merged 110 commits intodevelfrom
feat/bench-rollup-pr0-bench-common

Conversation

@elijahr
Copy link
Copy Markdown
Owner

@elijahr elijahr commented May 1, 2026

Consolidated omnibus PR for the bench-rollup feature stack. Originally landed
as 7 stacked PRs (Track 0 through Track 6); fast-forwarded into this branch
and the descendants closed.

81 commits / 70 files changed / +11,296 / -1,493.

What ships

Track 0 — Shared bench harness + native BMF emission

  • benchmarks/nim/bench_common.nim: Topology enum, BMFEmitter
    (alpha-sorted Bencher Metric Format JSON), Histogram (min-heap top-K +
    Algorithm R reservoir), runThroughputHarness / runLatencyHarness,
    Stats helpers.
  • benchmarks/merge_bmf.py: stateless union of BMF JSON fragments;
    exits 1 on (slug, measure) collisions.
  • bench_throughput.nim gains --bmf-out=<path>; bit-for-bit unchanged
    without the flag.
  • 5 new lockfreequeues adapters (sipmuc, mupsic, unbounded_sipsic,
    unbounded_sipmuc, unbounded_mupmuc); 3 existing renamed to canonical
    <library_slug>_adapter.nim.
  • Deleted: bmf_adapter.py, bench_main.nim, render_readme.nim.
  • bench.yml rewired to use native BMF; merge_bmf.py produces the
    single merged.json consumed by bencher run.

Track 1 — Latency wired to BMF, multi-measure-per-slug end-to-end

  • bench_latency.nim rewritten on runLatencyHarness; CLI mirrors
    bench_throughput. Per-binary intdefines:
    BenchLatencyRuns (33), BenchLatencyMessageCount (100_000),
    BenchLatencyWarmupRuns (3).
  • bench.yml split into three jobs: bench-throughput, bench-latency
    (new), bench-upload (merges fragments, single bencher run).
  • 4 smoke-shape slugs carry latency_p50_ns / latency_p95_ns /
    latency_p99_ns.
  • adapter.nim re-exports PushResult / PopResult from bench_common
    to unify the two parallel definitions.

Track 2 — Topology split (5 binaries) + per-step CI timeouts

  • 4 throughput binaries replacing bench_throughput.nim:
    bench_spsc.nim, bench_mpsc.nim, bench_mpmc.nim, bench_unbounded.nim
    (plus the latency binary from Track 1).
  • Each owns its own per-binary intdefine set so CI budgets each topology
    independently.
  • Deletion-safety: tests/fixtures/pre-split-slugs.json + superset_check.py
    ensure the post-split BMF is a superset of the pre-split slug set.
  • CI matrix with per-step timeout-minutes so a hang in one variant
    cannot burn the whole budget.

Track 3 — Comparison MVP: Loony, Boost.LockFree, Crossbeam adapters

  • 5 adapters: LoonyQueue (mpmc_unbounded), boost::lockfree::queue
    (mpmc bounded), boost::lockfree::spsc_queue (spsc bounded),
    crossbeam::ArrayQueue (mpmc bounded), crossbeam::SegQueue
    (mpmc_unbounded).
  • New benchmarks/rust/bench-ffi-crossbeam/ cdylib (stable toolchain,
    panic = "abort" for cross-FFI safety).
  • Soft-skip CI pattern: install -> smoke -> set-flag-on-success ->
    annotate-on-failure. workflow_dispatch accepts force_skip_* inputs.
  • New bench-comparison.yml runs Crossbeam on a nightly cron (separate
    Bencher Report; not on PR critical path).
  • Cache-line padding fix for unbounded queue segments: now allocated via
    posix_memalign through new internal/aligned_alloc.nim. Verified by
    tests/t_unbounded_padding.nim.

Track 4 — Comparison expansion: Threading.Channels, Nim Channel, MoodyCamel

  • 3 more adapters (4 with the smoke-test additions). MoodyCamel
    ConcurrentQueue vendored as single header at
    benchmarks/vendor/concurrentqueue/concurrentqueue.h (commit
    d655418bb644b7f85159d94c591d7d983949fb81); extern "C" wrapper
    isolates Nim from upstream template machinery.
  • THIRD_PARTY_LICENSES.md gains its first vendored entry.

Track 5 — Interactive uPlot charts + mkdocs deploy from devel

  • docs/assets/uplot-1.6.27.iife.min.js vendored (47884 bytes,
    SHA-256 verified against jsdelivr).
  • docs/assets/bench-charts.{js,css} renders an interactive throughput
    chart with library-toggle legend, log-scale Y toggle, hover tooltip,
    graceful fallbacks. Contract test test_bench_charts_contract.py
    guards the BMF -> chart shape.
  • docs/benchmarks.md embeds the chart via mkdocs-material's
    md_in_html. Page is registered in nav.
  • Snapshot publishing pipeline: merged.json is copied to
    docs/assets/bench-results/<sha>.json and latest.json on every
    devel push; commits as github-actions[bot]. Three-layer
    loop-prevention: paths-ignore, actor guard, narrow [skip ci].
  • docs.yml extended to redeploy mike's dev/ alias on devel pushes.
    Post-deploy curl verifies the published BMF asset returns HTTP 200
    and parses as JSON.
  • README BENCHMARKS markers replaced with a hand-curated 4-row summary
    • link to the live chart at the docs site.

Track 6 — Latency thresholds + p999 / max measures

  • bench_latency.nim emits latency_p999_ns and latency_max_ns
    alongside the p50/p95/p99 trio. Full tail-latency profile per
    (library, topology) cell.
  • HistogramTopK 1000 -> 5000: at production sample counts
    (BenchLatencyMessageCount=100_000 x BenchLatencyRuns=33,
    seenAll ~ 3.3M), p999 rank is ~3300 — K=5000 keeps p999 in the
    exact top-K stratum. Memory cost: 40KB per histogram.
  • Bencher CLI adds latency p99 regression threshold (alongside
    the existing throughput threshold).
  • New bench-tests CI job runs the bench harness test suite
    (benchtests nimble task) on every PR; pinned to v0.6.0 /
    v0.7.0 sibling deps; no fork-PR guard (no secrets needed for
    unit tests).

Tests added

  • t_bench_common.nim: 26 tests — BMFEmitter ordering, Histogram
    percentile accuracy on log-normal, harness smoke, all 5 new
    lockfreequeues adapters round-trip.
  • t_bench_latency.nim: 7 tests — intdefine compile-time asserts,
    --bmf-out integration, multi-fragment merge + collision guard.
  • t_bench_adapters.nim: round-trip set-equality tests for every
    comparison adapter, gated by the same -d:adapter_*_available
    defines.
  • t_topology_split.nim: per-binary slug coverage + superset check.
  • t_unbounded_padding.nim: 8 cache-line-alignment assertions across
    the 4 unbounded variants (c/cpp/arc/refc).
  • t_aligned_alloc.nim: aligned_alloc helper unit tests.
  • benchmarks/tests/test_merge_bmf.py: 10 tests covering slug regex,
    measure regex, collision detection, alpha-sorted output,
    NaN rejection.
  • benchmarks/tests/test_superset_check.py: deletion-safety regression
    tests for the topology split.
  • benchmarks/tests/test_bench_charts_contract.py: 9 tests guarding
    the BMF -> JS chart contract.

Verification

All Track-level review feedback (13 lows from constituent PRs)
addressed in-tree before consolidation. Original constituent PRs were
each individually approved by Momus. Local nim test suite: 200/200
pass.

Cost gate context

Bencher upload cost gate is in place: PR-base allowlist limited to
main / devel (no feat/**), and PR uploads require the bench
label. This omnibus PR will trigger bench-throughput /
bench-latency runs on push (non-billable to Bencher absent the
label) but the bencher run upload step skips without it. Add the
bench label deliberately to record this revision.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

🐰 Bencher Report

Branchfeat/bench-rollup-pr0-bench-common
Testbedubuntu-latest

⚠️ WARNING: No Threshold found!

Without a Threshold, no Alerts will ever be generated.

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds flag.

Click to view all benchmark results
Benchmarkthroughput_ops_msMeasure (units) x 1e3
lockfreequeues_mupmuc/mpmc/1p1c📈 view plot
⚠️ NO THRESHOLD
14.42 units x 1e3
lockfreequeues_mupmuc/mpmc/2p2c📈 view plot
⚠️ NO THRESHOLD
12.97 units x 1e3
lockfreequeues_mupmuc/mpmc/4p4c📈 view plot
⚠️ NO THRESHOLD
8.84 units x 1e3
lockfreequeues_mupmuc/mpmc/8p8c📈 view plot
⚠️ NO THRESHOLD
8.74 units x 1e3
lockfreequeues_sipsic/spsc/1p1c📈 view plot
⚠️ NO THRESHOLD
7.75 units x 1e3
lockfreequeues_unbounded_mupsic/mpsc_unbounded/1p1c📈 view plot
⚠️ NO THRESHOLD
14.42 units x 1e3
lockfreequeues_unbounded_mupsic/mpsc_unbounded/2p1c📈 view plot
⚠️ NO THRESHOLD
13.52 units x 1e3
lockfreequeues_unbounded_mupsic/mpsc_unbounded/4p1c📈 view plot
⚠️ NO THRESHOLD
12.24 units x 1e3
nim_channels/mpmc/1p1c📈 view plot
⚠️ NO THRESHOLD
2.19 units x 1e3
nim_channels/mpmc/2p2c📈 view plot
⚠️ NO THRESHOLD
1.19 units x 1e3
nim_channels/mpmc/4p4c📈 view plot
⚠️ NO THRESHOLD
1.50 units x 1e3
🐰 View full continuous benchmarking report in Bencher

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request overhauls the benchmarking infrastructure, replacing the legacy aggregator and regex-based parser with native Bencher Metric Format (BMF) emission in bench_throughput.nim. It introduces a shared harness module (bench_common.nim), a comprehensive set of lock-free queue adapters, and a merge_bmf.py utility for combining results. Review feedback focuses on correcting a type hint in the Python merge script and ensuring CLI error messages are directed to stderr instead of stdout to follow best practices.

Comment thread benchmarks/merge_bmf.py
Comment on lines +45 to +48
def _die(msg: str) -> "int":
"""Print to stderr and return exit code 1."""
print(msg, file=sys.stderr)
return 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _die function returns an integer exit code but is typed to return a string. This should be corrected to return int.

Comment thread benchmarks/nim/bench_throughput.nim Outdated
Comment on lines +529 to +531
if p.val.len == 0:
echo "Missing value for --bmf-out"
quit 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message for a missing value for --bmf-out is printed to stdout. It is best practice to print error messages to stderr.

echo "Missing value for --bmf-out"
quit(1, QuitFailure)

Comment thread benchmarks/nim/bench_throughput.nim Outdated
Comment on lines +534 to +535
echo "Unknown flag: --", p.key
quit 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message for an unknown flag is printed to stdout. It is best practice to print error messages to stderr.

stderr.writeLine("Unknown flag: --", p.key)
quit(1, QuitFailure)

Comment thread benchmarks/nim/bench_throughput.nim Outdated
Comment on lines 547 to 549
echo "Unknown variant group: ", arg
echo "Supported: ", SupportedGroups
quit 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message for an unknown variant group is printed to stdout. It is best practice to print error messages to stderr.

stderr.writeLine("Unknown variant group: ", arg)
stderr.writeLine("Supported: ", SupportedGroups)
quit(1, QuitFailure)

elijahr added 14 commits May 1, 2026 22:47
Compile-only stub for benchmarks/nim/bench_common.nim. Types and proc
signatures are final per design doc section 2.1; bodies raise
AssertionDefect with 'not implemented' so tasks 0.2-0.6 can be written
and reviewed in parallel against this surface.

Public surface stubbed:
- Topology enum (tSpsc/tMpsc/tMpmc + unbounded variants)
- PushResult / PopResult adapter primitives
- BMFEmitter + initBMFEmitter / addMeasure / emit
- Stats helpers (mean / stddev / minVal / maxVal / percentile)
- Histogram + initHistogram / record / topK / percentile
  (HistogramTopK = 1000, HistogramReservoir = 99000)
- LatencyMetrics + runLatencyHarness
- ThroughputMetrics + runThroughputHarness

Test (tests/t_bench_common.nim) is compile-time-only: it references
each promised symbol via 'when declared' and 'when compiles', leaving
behavior tests to subsequent tasks.

Stub file: 170 LOC (acceptance cap: 200).
initBMFEmitter / addMeasure / emit per design doc section 2.1.

- addMeasure stores via Table[slug, Table[measure, MeasureValue]];
  NaN sentinel for lower/upper bounds maps to none[float](),
  finite values map to some(v).
- emit writes JSON with slugs alpha-sorted at the top level and
  measures alpha-sorted within each slug. lower_value/upper_value
  fields are emitted only when the corresponding bound is some.
- Output format: pretty(root, 2) plus a trailing newline so
  diffs in CI / local-dev surface line-level changes cleanly.

Tests (tests/t_bench_common.nim): empty emit produces {}, two
slugs alpha-ordered, measures alpha-ordered within slug, NaN bounds
omitted from output, finite bounds emitted as lower_value/upper_value.
mean / stddev / minVal / maxVal / percentile bodies. Sort-based
percentile uses linear interpolation between adjacent samples;
matches numpy's default percentile interpolation ('linear').
stddev uses ddof=1 (sample) for consistency with numpy default.
Empty-input behavior matches the legacy benchmarks/nim/stats.nim
that this module replaces (returns 0.0 across the board).

Tests: numpy-baseline parity for stddev on [1,2,3,4]; min/max on
unordered input; percentile(p=0.5) on [0..99] = 49.5; percentile
extremes (p=0.0, p=1.0) match min and max.
Online percentile tracker per design doc section 2.1.

- topKHeap: min-heap of HistogramTopK = 1000 largest values seen.
  record() pushes if heap < K; else replace if value > heap[0]
  (evicts the min into the body bucket via reservoirAdmit).
- reservoir: Algorithm R uniform sample of body distribution, size
  HistogramReservoir = 99_000.
- debugMode (initHistogram(true)): records every sample to debugAll
  and percentile() returns the exact sort answer over debugAll. Used
  as the test oracle and as a debug-only fallback per design 2.1.

Lookup rule with stratified-estimator math:
  p == 1.0                          -> top-K max
  seenAll*(1-p) <= topKHeap.len     -> top-K rank lookup (linear interp)
  otherwise                         -> reservoir, with p rescaled to
                                       p_body = (p*seenAll) / (seenAll - K)
                                       since the reservoir samples ONLY the
                                       body stratum (values that lost the
                                       top-K race). Without rescaling a
                                       reservoir-based p99 underestimates
                                       by ~25% on log-normal data.

Tests: p99 within 1% of sort fallback on 100K log-normal samples;
percentile(1.0) == top-K max; p50 within 5% of sort fallback;
debug mode returns exact sort answer; topK() returns ascending-sorted
samples.
Generic N-producer / N-consumer throughput runner factored out of
the legacy benchmarks/nim/bench_throughput.nim. Parameterized over the
queue init proc and the queue type Q (constrained at instantiation
time via 'mixin push' / 'mixin pop' so adapter-specific push/pop are
resolved at the call site).

The remainder-spread deadlock-avoidance rule is preserved verbatim:
if messageCount mod numProducers != 0, the leading producers each get
one extra item so the producer-side total exactly equals messageCount;
same for consumers. Without this, the harness silently degrades into
a deadlock (consumers waiting forever) or strands items in the queue
(consumers stop early).

Returns ThroughputMetrics{ops_ms_mean, ops_ms_stddev, runs}; warmup
runs are discarded. Smoke test in tests/t_bench_common.nim with a
1024-cap channel SmokeAdapter at 1P/1C × 1000 messages × 1 run.
Ping-pong RTT runner. For each run:
- allocates two queues fwd + rev of type Q via queueInit()
- spawns 1 pinger + 1 ponger thread
- pinger pushes a monotonic-ns timestamp via fwd, spins on rev for
  the echo, records RTT (ns) into a per-run Histogram
- ponger pops from fwd, pushes the same value to rev

Aggregation per design 2.5 'Run aggregation for latency': reported
percentiles are the MEAN across runs of each percentile, NOT the
percentile of unioned samples. Per-run histograms isolate scheduler
artifacts to the run that contains them.

Returns LatencyMetrics{p50_ns, p95_ns, p99_ns, p999_ns, max_ns,
samples}. samples is the cumulative across-runs sample count.

This task ships the 1P/1C smoke topology only; multi-thread sharding
lands in PR 1 (Task 1.x). 1P/1C is sufficient for the smoke check
because PR 1's first BMF measure uses Sipsic 1P/1C as the smoke shape.

Test: SmokeAdapter (channel-backed), 1000 messages, 1 run, 0 warmup;
asserts samples >= 1000 and p50 < p99 < max.
Stateless union of per-slug measure dicts across N input BMF JSON
files into one output JSON, per design doc section 4.3. Required
because Bencher.dev creates a separate Report per 'bencher run'
invocation; multiple uploads from sibling CI jobs would NOT co-locate
multiple measures on a single per-slug history.

Contract:
  merge_bmf.py <output_path> <input1> [input2 ...]

- Slug regex: ^[a-z][a-z0-9_]*/[a-z][a-z0-9_]*/\d+p\d+c$
- Measure regex: ^[a-z][a-z0-9_]*$
- Value fields ('value', 'lower_value', 'upper_value') must be finite,
  non-NaN floats. Any malformed input or non-finite value -> exit 1.
- Collision (same slug + same measure across two inputs) -> exit 1
  with stderr 'error: collision on slug=<slug> measure=<measure> in
  files <pathA> and <pathB>'.
- Output is alpha-sorted at both top level (slugs) and within each
  slug (measures), 2-space indent.

Tests in benchmarks/tests/test_merge_bmf.py cover all 8 design-spec
cases plus an alpha-sort assertion: zero-input usage error, single
empty pass-through, single valid round-trip, two disjoint slugs,
shared slug + disjoint measures, collision, NaN rejection, invalid
measure key (uppercase), invalid slug shape.
Adds the 5 adapters not yet present per design doc 2.2:
- lockfreequeues_sipmuc_adapter.nim   (bounded SPMC)
- lockfreequeues_mupsic_adapter.nim   (bounded MPSC)
- lockfreequeues_unbounded_sipsic_adapter.nim
- lockfreequeues_unbounded_sipmuc_adapter.nim
- lockfreequeues_unbounded_mupmuc_adapter.nim

Each adapter file exports:
- 'topologiesSupported: set[Topology]' constant (consumed by PR 3 Task
  3.11 to restrict shape generation; declared but unused in PR 0)
- 'make<Lib>Adapter' factory taking 'capacity: int'
- 'cleanup' tear-down proc
- 'push(item)' returning bench_common.PushResult
- 'pop()' returning bench_common.PopResult[T]

Smoke test in tests/t_bench_common.nim: each adapter pushes 100
sequential uint64s, pops them, asserts count == 100 AND set-equality
with the input range.

Implementation note (Nim 2.2.6 codegen workaround):
The unbounded adapters store the queue INLINE rather than via
heap pointer. Heap-allocating UnboundedSipsic/UnboundedSipmuc/
UnboundedMupmuc and then calling reset() or =destroy() on the
referent from a generic adapter cleanup proc triggers
'internal error: getTypeDescAux(tyGenericParam)' OR
'internal error: expr(nkIdent); unknown node kind' in Nim 2.2.6
when bench_common is also imported. Inline storage lets the
compiler emit the per-instance =destroy automatically when the
adapter goes out of scope, sidestepping the bug. The DEBRA
manager is still heap-allocated because the queue constructors
take 'ptr DebraManager'. Track upstream Nim repro for follow-up.
…port `topologiesSupported`

Per design section 2.2 (`<library_slug>_adapter.nim` filename convention):

- Rename via `git mv` (history preserved):
  - lockfreequeues_sipsic.nim -> lockfreequeues_sipsic_adapter.nim
  - lockfreequeues_mupmuc.nim -> lockfreequeues_mupmuc_adapter.nim
  - lockfreequeues_unbounded_mupsic.nim -> lockfreequeues_unbounded_mupsic_adapter.nim

- Update bench_throughput.nim import paths to match.

- Add `topologiesSupported*: set[Topology]` constant to all four
  existing adapters (3 renamed + channels_adapter). Use selective
  `from ../bench_common import Topology, <member>` to avoid pulling
  PushResult into scope, which would conflict with adapter.PushResult
  (the legacy concept in ../adapter still imported by these adapters).

- `topologiesSupported` is exported in PR 0 but consumed only by PR 3
  Task 3.11; the "unused export" hint is expected per the impl plan.
…ughput

Add --bmf-out=<path> flag (parsed via std/parseopt) that emits Bencher
Metric Format JSON for every variant that ran. With the flag absent the
binary is bit-for-bit unchanged; the existing positional CLI
(`bench_throughput sipsic mupmuc unbounded_mupsic channels`) is
preserved verbatim.

Slug shape per design 2.2:

  lockfreequeues_sipsic/spsc/1p1c
  lockfreequeues_mupmuc/mpmc/{1,2,4,8}p{1,2,4,8}c
  lockfreequeues_unbounded_mupsic/mpsc_unbounded/{1,2,4}p1c
  nim_channels/mpmc/{1,2,4}p{1,2,4}c

Each slug carries the `throughput_ops_ms` measure as
{value: mean, lower_value: mean-stddev, upper_value: mean+stddev} —
matching the convention bmf_adapter.py used (which this code replaces
in Task 0.12).

Note on slug for built-in channels: the human-readable adapter name()
returns "nim/channels" (kept for stdout text), but the slug uses
underscore form `nim_channels` so it satisfies the merge_bmf.py
SLUG_RE pattern `^[a-z][a-z0-9_]*/[a-z][a-z0-9_]*/\d+p\d+c$`.

Per-variant {.intdefine.}s for runs/warmup added so the integration
test can build a fast variant via -d:BenchSipsicRuns=2
-d:BenchSipsicWarmup=0. Default values (10 runs, 0 warmup) match the
prior hard-coded behavior.

Selective `from ./bench_common import` is used to pull only
{BMFEmitter, initBMFEmitter, addMeasure, emit} — importing the full
module would shadow `results.ThroughputMetrics` with
`bench_common.ThroughputMetrics` and break the legacy bespoke harness
procs that still drive Mupmuc and UnboundedMupsic.

Test: tests/t_bench_common.nim adds an integration test that compiles
bench_throughput with small overrides, runs it with --bmf-out=<tmp>
sipsic, parses the resulting JSON, and asserts the expected slug and
measure exist with a positive value. Stdout text output is also
asserted (`Sipsic` substring) so the legacy CLI contract is exercised.
Apply the four edits from design section 3 (PR 0):

(a) Remove `Run adapter unit tests` step. The Python regex parser it
    sanity-checked is gone in Task 0.12; nothing left to test.
(b) Remove `Convert to BMF JSON` step. bench_throughput now emits BMF
    natively via `--bmf-out=throughput.json` (Task 0.10).
(c) Update `Run bench_throughput` step to add the `--bmf-out=throughput.json`
    flag while preserving the existing positional variant filter and the
    20-minute step timeout.
(d) Insert `Merge BMF JSON` step running
    `python3 benchmarks/merge_bmf.py merged.json throughput.json`.
    Single-input today, but stays in place for PR 2-4 when bench is
    split into per-topology binaries each producing its own fragment.
(e) Change all three `bencher run` invocations and the debug-cat step
    to use `--file merged.json` instead of `bench_results.json`.

actionlint clean. The bench upload contract is unchanged: same
project, branch, testbed, thresholds, comparison semantics — only the
file source flips from regex-parsed text to native BMF.
…or consumers

Three legacy files removed via `git rm`:

- `benchmarks/bmf_adapter.py` — regex parser for bench_throughput stdout.
  Replaced by native BMF emission via `--bmf-out=` (Task 0.10).
- `benchmarks/test_bmf_adapter.py` — unit tests for the parser. Replaced
  by `benchmarks/tests/test_merge_bmf.py` (Task 0.7).
- `benchmarks/nim/bench_main.nim` — aggregator binary that wrapped
  bench_throughput + bench_latency and emitted a custom JSON shape.
  Bench_throughput is now the canonical entry point and emits BMF
  directly; bench_latency follows in PR 1.

Live consumers refactored in this same task (per impl plan §0.12):

- `benchmarks/runner.py`: build_nim now compiles bench_throughput.nim;
  run_nim invokes `bench_throughput --bmf-out=<path>` instead of
  `bench_main --runs=<n> -o=<path>`. The `runs` argument is preserved
  in the CLI for back-compat but is now compile-time-only.
- `lockfreequeues.nimble`: `task benchmarks` switched to bench_throughput.
  Output path corrected to `.tmp/bench_throughput` (matching the project
  nim.cfg `--outdir:.tmp`).
- `benchmarks/README.md`: rewritten to document the new flow
  (bench_common module, adapter convention, merge_bmf.py, native BMF
  emission, expected slug set).
- `.gitignore`: drop the `bench_main` build-artifact ignore line.

`git grep "bmf_adapter\|bench_main"` returns only intentional historical
references in code comments and CHANGELOG (prior release notes).

Acceptance: `nimble tasks` parses; `nim c benchmarks/nim/bench_throughput.nim`
compiles; tests/t_bench_common.nim passes all 26 tests including the
Task 0.10 --bmf-out integration test.
The legacy bench_main aggregator emitted JSON of shape
`{results: [{implementation, thread_config, metrics: {...}}], metadata: {...}}`.
Bench_main was deleted in Task 0.12, replaced by `bench_throughput
--bmf-out` (Task 0.10) + `merge_bmf.py` (Task 0.7) which produce
Bencher Metric Format JSON of shape
`{<lib>/<topology>/<P>p<C>c: {<measure>: {value, lower_value?, upper_value?}}}`.

`benchmarks/render_readme.nim` now walks the BMF shape directly:

- New `parseSlug(slug)` helper splits `<lib>/<topology>/<P>p<C>c` into
  `(impl, "<P>P/<C>C")` for the two table columns. Strict on shape:
  rejects slugs with the wrong segment count or non-digit P/C, so
  malformed entries do not produce noise rows.
- `loadRows` walks every top-level slug, pulls
  `<slug>.throughput_ops_ms.value` for the throughput column, and
  `<slug>.latency_p50_ns.value` for the optional latency column
  (PR 1+ emits this; PR 0 does not, so all PR 0 rows show "—").
- BMF has no `metadata` block, so the meta argument is always nil.
  `renderTable` already accepts nil and skips the platform header in
  that case — no change needed.
- `when isMainModule` block guarded by `not defined(renderReadmeNoMain)`
  so unit tests can include this file without triggering the README
  rewrite. CLI invocations (`nim r benchmarks/render_readme.nim`) leave
  the define unset and run main as before.

Test: tests/t_render_readme.nim (new) writes a fixture BMF JSON with
three slugs (mupmuc/2p2c, sipsic/1p1c, nim_channels/1p1c), runs
`loadRows` and `renderBlock`, and asserts:
- 3 rows produced; impl/threads/throughput parsed correctly per slug
- alpha sort by impl preserved
- meta == nil (BMF carries no metadata)
- rendered Markdown table contains every impl name and value
- latency column is em-dash for every row (no `latency_p50_ns` in fixture)
- missing JSON file falls back to placeholder
- empty `{}` BMF document falls back to placeholder

All 4 new tests + 26 existing t_bench_common tests pass.
Document Track 0 of the bench-rollup feature under [Unreleased]:

- Added: bench_common.nim module, 5 new lockfreequeues adapters,
  merge_bmf.py CLI, --bmf-out flag, per-variant run-count intdefines.
- Changed: bench_throughput emits BMF natively, 3 adapters renamed to
  the <library_slug>_adapter.nim convention, render_readme.nim consumes
  the new BMF shape, runner.py / nimble task / benchmarks/README.md
  point at bench_throughput.
- Removed: bmf_adapter.py, test_bmf_adapter.py, bench_main.nim.

Each entry is one bullet (one logical change), no test plan checklist,
no GitHub issue references.
@elijahr elijahr force-pushed the feat/bench-rollup-pr0-bench-common branch from 6a882f7 to 6d725c6 Compare May 2, 2026 03:49
@elijahr
Copy link
Copy Markdown
Owner Author

elijahr commented May 2, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request overhauls the benchmarking infrastructure by introducing a shared harness module (bench_common.nim), native Bencher Metric Format (BMF) JSON emission, and updated adapters for various lock-free queue implementations. It replaces the legacy bench_main aggregator and Python-based regex parser with a robust system including a BMF merging utility and updated documentation. Feedback identifies memory leaks in the latency and throughput harnesses due to missing queue cleanup, and resource leaks in adapters where heap-allocated queues are deallocated without invoking destructors.

Comment on lines +353 to +396
proc runOneLatencyRun[Q](
queueInit: proc(): Q,
messageCount: int,
): LatencyMetrics =
## One ping-pong RTT run. Allocates fwd + rev queues of type Q,
## spawns 1 pinger and 1 ponger, records RTT samples, returns
## LatencyMetrics{p50, p95, p99, p999, max, samples}.
##
## NOTE: This task ships the 1P/1C smoke topology only. PR 1 (Task 1.x)
## extends to multi-pinger / multi-ponger by sharding `messageCount`
## across additional threads with the remainder-spread rule from
## runThroughputHarness. The 1P/1C path keeps RTT semantics clean
## (no scheduler interleave between ping and pong) which is what the
## p50 / p99 measurements in PR 1 hinge on.
var fwd = queueInit()
var rev = queueInit()
var hist = initHistogram()

var pingerCtx = PingerCtx[Q](
fwd: addr fwd, rev: addr rev,
count: messageCount,
histogram: addr hist,
)
var pongerCtx = PongerCtx[Q](
fwd: addr fwd, rev: addr rev,
count: messageCount,
)

var pingerThread: Thread[ptr PingerCtx[Q]]
var pongerThread: Thread[ptr PongerCtx[Q]]

createThread(pingerThread, pingerThreadBody[Q], addr pingerCtx)
createThread(pongerThread, pongerThreadBody[Q], addr pongerCtx)
joinThread(pingerThread)
joinThread(pongerThread)

result = LatencyMetrics(
p50_ns: hist.percentile(0.50),
p95_ns: hist.percentile(0.95),
p99_ns: hist.percentile(0.99),
p999_ns: hist.percentile(0.999),
max_ns: hist.percentile(1.0),
samples: hist.seenAll,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The runOneLatencyRun procedure leaks memory because it creates two queue instances (fwd and rev) but never calls their cleanup procedures. Since the adapters (e.g., LockfreequeuesMupsicAdapter) allocate memory on the heap, this will lead to significant memory accumulation over multiple benchmark runs.

Comment on lines +470 to +540
proc runOneThroughputRun[Q](
queueInit: proc(capacity: int): Q,
capacity: int,
numProducers: int,
numConsumers: int,
messageCount: int,
): float =
## One run of the throughput harness; returns ops/ms.
##
## Distribution invariants (mirrors legacy bench_throughput
## `runThroughputBenchmark`):
## 1. sum(producer.count) == messageCount
## 2. sum(consumer.count) == messageCount
## A naive `messageCount div N` truncates and breaks both when
## `messageCount` is not divisible by `numProducers` or `numConsumers`,
## which deadlocks the consumers (waiting for items the producers never
## enqueue) or strands items in the queue (consumers stop early).
## Spread the remainder over the first `messageCount mod N` workers so
## totals match exactly for any (P, C, messageCount) triple.
var queue = queueInit(capacity)
let baseP = messageCount div numProducers
let remP = messageCount mod numProducers
let baseC = messageCount div numConsumers
let remC = messageCount mod numConsumers

var producerThreads = newSeq[Thread[ptr ProducerCtx[Q]]](numProducers)
var consumerThreads = newSeq[Thread[ptr ConsumerCtx[Q]]](numConsumers)
var producerCtxs = newSeq[ProducerCtx[Q]](numProducers)
var consumerCtxs = newSeq[ConsumerCtx[Q]](numConsumers)

var nextStart = 0
for i in 0 ..< numProducers:
let count = baseP + (if i < remP: 1 else: 0)
producerCtxs[i] = ProducerCtx[Q](
queue: addr queue,
startIdx: nextStart,
count: count,
)
nextStart += count

for i in 0 ..< numConsumers:
let count = baseC + (if i < remC: 1 else: 0)
consumerCtxs[i] = ConsumerCtx[Q](
queue: addr queue,
count: count,
)

# Monotonic clock — `epochTime` (wall clock) can step backward across
# NTP adjustments and skew throughput numbers. Nanosecond precision:
# ms-precision buckets multiple short runs into the same integer ms,
# producing identical samples and stddev=0 on a fast CI runner. ops/ms
# is reconstructed as a float at print time.
let startTime = getMonoTime()

for i in 0 ..< numProducers:
createThread(producerThreads[i], producerThreadBody[Q],
addr producerCtxs[i])
for i in 0 ..< numConsumers:
createThread(consumerThreads[i], consumerThreadBody[Q],
addr consumerCtxs[i])

for i in 0 ..< numProducers:
joinThread(producerThreads[i])
for i in 0 ..< numConsumers:
joinThread(consumerThreads[i])

let elapsedNs = float(inNanoseconds(getMonoTime() - startTime))
if elapsedNs <= 0.0:
return 0.0
result = float(messageCount) * 1_000_000.0 / elapsedNs

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The runOneThroughputRun procedure leaks memory because it creates a queue instance but never calls its cleanup procedure. This results in a memory leak for every benchmark run and warmup iteration.

Comment on lines +32 to +37
proc cleanup*[N, P: static int, T](
a: var LockfreequeuesMupsicAdapter[N, P, T]
) =
if a.queue != nil:
dealloc(a.queue)
a.queue = nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cleanup procedure uses dealloc on the queue pointer without running its destructor. This will leak any resources held by the Mupsic instance (like its internal buffer if T is a managed type). You should call reset(a.queue[]) or =destroy(a.queue[]) before deallocating the memory.

Comment on lines +45 to +50
proc cleanup*[N, C: static int, T](
a: var LockfreequeuesSipmucAdapter[N, C, T]
) =
if a.queue != nil:
dealloc(a.queue)
a.queue = nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the Mupsic adapter, dealloc here skips the destructor for the Sipmuc instance. Use reset(a.queue[]) before dealloc to ensure proper resource reclamation.

… harness

Per Gemini review on PR #19. Adapters whose Q heap-allocates the
backing queue (lockfreequeues bounded mupsic/sipmuc/mupmuc, channels)
leaked the queue every time runOneLatencyRun or runOneThroughputRun
returned: `var fwd = queueInit()` only frees the adapter struct's
stack slot at scope exit, not the create()-d Mupsic/Sipmuc/Mupmuc/
Channel pointee. Across N runs * M warmups the leak grew without bound.

Two-part fix:

1. bench_common.nim: harness now drops the queue via a uniform
   `mixin cleanup; when compiles(cleanup(q)): cleanup(q)` block after
   joinThread (and before any return path, including the elapsedNs<=0
   short-circuit in runOneThroughputRun). `when compiles` keeps
   value-type adapters (sipsic, unbounded inline storage) usable
   without forcing them to define a no-op cleanup.

2. lockfreequeues_mupsic / sipmuc cleanup procs: prepend reset(a.queue[])
   so Mupsic/Sipmuc's `=destroy` actually runs before the heap slot
   is freed. Bare dealloc skipped any queue-internal cleanup that an
   ARC/ORC-generated destructor would otherwise do (managed T fields,
   future atomic-cleanup hooks). Same fix on the mupmuc adapter inside
   deinitMupmucAdapter.

3. mupmuc and channels adapters gain a `cleanup` alias around their
   existing deinit*Adapter procs so the harness's mixin cleanup call
   resolves uniformly across adapter naming conventions.

Verified by smoke build of `runLatencyHarness` + `runThroughputHarness`
against LockfreequeuesMupsicAdapter; no leak-on-return, percentiles
and ops/ms reported as expected.
@elijahr
Copy link
Copy Markdown
Owner Author

elijahr commented May 2, 2026

/gemini review

@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@elijahr
Copy link
Copy Markdown
Owner Author

elijahr commented May 2, 2026

/review

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented May 2, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0)

Grey Divider


Action required

1. Premature DebraManager free🐞 Bug ☼ Reliability
Description
cleanup() in the new unbounded adapters deallocates the DebraManager while the inline queue
still retains a pointer to it; when the queue is later destroyed it will call
unbindClient(self.manager[]) on freed memory. This can trigger destructor assertions or memory
corruption/UAF during adapter scope exit.
Code

benchmarks/nim/adapters/lockfreequeues_unbounded_mupmuc_adapter.nim[R43-51]

+proc cleanup*[S: static int, T; MaxThreads: static int](
+    a: var LockfreequeuesUnboundedMupmucAdapter[S, T, MaxThreads]
+) =
+  ## Free manager only; queue is inline and its `=destroy` runs when
+  ## the adapter object goes out of scope.
+  if a.manager != nil:
+    reset(a.manager[])
+    dealloc(a.manager)
+    a.manager = nil
Evidence
Both unbounded adapters free a.manager in cleanup() but do not destroy/reset the inline queue
first. The underlying UnboundedMupmuc/UnboundedSipmuc types store the manager pointer inside the
queue object (manager: ptr DebraManager[...]), bind a client refcount on construction
(bindClient(manager[])), and later unbind during =destroy via unbindClient(self.manager[]), so
the manager must outlive the queue instance.

benchmarks/nim/adapters/lockfreequeues_unbounded_mupmuc_adapter.nim[43-51]
benchmarks/nim/adapters/lockfreequeues_unbounded_sipmuc_adapter.nim[43-55]
src/lockfreequeues/unbounded_mupmuc.nim[72-89]
src/lockfreequeues/unbounded_mupmuc.nim[144-146]
src/lockfreequeues/unbounded_mupmuc.nim[436-465]
src/lockfreequeues/unbounded_sipmuc.nim[75-92]
src/lockfreequeues/unbounded_sipmuc.nim[140-142]
src/lockfreequeues/unbounded_sipmuc.nim[376-405]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The unbounded adapters' `cleanup()` frees the `DebraManager` while the inline `queue` still holds a pointer to it. The queue destructor later unbinds from `self.manager[]`, which will be freed, causing assertions or UAF.
### Issue Context
`UnboundedMupmuc`/`UnboundedSipmuc` call `bindClient(manager[])` in their constructors and `unbindClient(self.manager[])` in `=destroy`, so the manager must remain valid until after the queue is destroyed/reset.
### Fix Focus Areas
- benchmarks/nim/adapters/lockfreequeues_unbounded_mupmuc_adapter.nim[43-51]
- benchmarks/nim/adapters/lockfreequeues_unbounded_sipmuc_adapter.nim[43-55]
### Recommended fix
Update `cleanup()` to destroy/reset the inline queue *before* resetting/deallocating the manager, e.g.:
- call `reset(a.queue)` (or otherwise invoke the queue's destructor) while `a.manager` is still valid
- then `reset(a.manager[])` and `dealloc(a.manager)`
Alternative (often simpler): store the manager inline in the adapter and pass `addr a.manager` into `newUnbounded*`, so normal field-destruction order naturally keeps the manager alive until after the queue is destroyed.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Harness zero-thread crash 🐞 Bug ☼ Reliability ⭐ New
Description
runOneThroughputRun divides/mods by numProducers and numConsumers without checking they’re >0,
so runThroughputHarness can crash immediately on 0 inputs. Because this is an exported shared
harness, this becomes a sharp edge for any future caller and fails without a clear error message.
Code

benchmarks/nim/bench_common.nim[R501-506]

+  var queue = queueInit(capacity)
+  let baseP = messageCount div numProducers
+  let remP = messageCount mod numProducers
+  let baseC = messageCount div numConsumers
+  let remC = messageCount mod numConsumers
+
Evidence
The new shared harness computes per-worker counts via div/mod using the user-provided
numProducers/numConsumers, but there is no guard in either runOneThroughputRun or the exported
runThroughputHarness to prevent 0 from reaching these operations.

benchmarks/nim/bench_common.nim[482-506]
benchmarks/nim/bench_common.nim[562-591]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`benchmarks/nim/bench_common.nim` computes `messageCount div numProducers` / `messageCount div numConsumers` without guarding that `numProducers` and `numConsumers` are > 0. This can crash the benchmark harness when misconfigured and produces an unclear failure mode.

### Issue Context
This is a shared harness exported for reuse; even if current call sites pass valid values, it should fail fast with a clear message or return a safe default when parameters are invalid.

### Fix Focus Areas
- benchmarks/nim/bench_common.nim[482-506]
- benchmarks/nim/bench_common.nim[562-591]

### What to change
- Add an explicit guard (e.g., `doAssert numProducers > 0 and numConsumers > 0`) in `runOneThroughputRun` or (preferably) in the exported `runThroughputHarness*`.
- Optionally also guard `messageCount >= 0` / `capacity >= 0` and define behavior for `runCount <= 0` (match the latency harness style).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Adapters block per-thread registration 🐞 Bug ≡ Correctness ⭐ New
Description
The new lockfreequeues_mupsic_adapter and lockfreequeues_sipmuc_adapter store a single
Producer/Consumer created at adapter construction time and keep the underlying queue pointer
private, so callers cannot obtain per-thread producers/consumers as the adapters’ own comments
describe. This prevents implementing correct multi-thread topologies and risks accidental
cross-thread use of a single Producer/Consumer.
Code

benchmarks/nim/adapters/lockfreequeues_mupsic_adapter.nim[R15-43]

+type
+  LockfreequeuesMupsicAdapter*[N, P: static int, T] = object
+    queue: ptr Mupsic[N, P, T]
+    producer: Producer[N, P, T]
+
+proc makeLockfreequeuesMupsicAdapter*[N, P: static int, T](
+    capacity: int = N
+): LockfreequeuesMupsicAdapter[N, P, T] =
+  ## Allocate and initialize a Mupsic[N, P, T]. The pre-built `producer`
+  ## slot lets the smoke / 1p1c shape drive push from the calling
+  ## thread; multi-producer shapes register additional producers
+  ## per-thread via `queue.getProducer(idx = i)`.
+  doAssert capacity == N, "capacity must equal static N"
+  result.queue = create(Mupsic[N, P, T])
+  result.queue[] = initMupsic[N, P, T]()
+  result.producer = result.queue[].getProducer(idx = 0)
+
+proc cleanup*[N, P: static int, T](
+    a: var LockfreequeuesMupsicAdapter[N, P, T]
+) =
+  if a.queue != nil:
+    reset(a.queue[])
+    dealloc(a.queue)
+    a.queue = nil
+
+proc push*[N, P: static int, T](
+    a: var LockfreequeuesMupsicAdapter[N, P, T], item: T
+): PushResult =
+  if a.producer.push(item): prSuccess else: prFull
Evidence
Both adapters claim that multi-thread shapes should register per-thread producers/consumers via
queue.getProducer(idx=...) / getConsumer(idx=...), but the only queue reference is a
non-exported field, and push/pop use the single stored producer/consumer. Existing throughput
code for similar queues demonstrates the needed pattern: acquiring producer/consumer inside each
worker thread.

benchmarks/nim/adapters/lockfreequeues_mupsic_adapter.nim[4-52]
benchmarks/nim/adapters/lockfreequeues_sipmuc_adapter.nim[7-65]
benchmarks/nim/bench_throughput.nim[174-206]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`lockfreequeues_mupsic_adapter.nim` and `lockfreequeues_sipmuc_adapter.nim` currently:
- create exactly one `Producer`/`Consumer` at construction time, and
- keep the backing `queue` pointer private,
while comments state multi-thread shapes should register per-thread producers/consumers via `queue.getProducer/getConsumer`.

This makes correct multi-threaded benchmarking impossible with the current adapter surface and can lead to unsafe sharing of a single producer/consumer across threads.

### Issue Context
`benchmarks/nim/bench_throughput.nim` already documents and implements the correct approach for similar queues: allocate producer/consumer inside each worker thread via `getProducer/getConsumer`.

### Fix Focus Areas
- benchmarks/nim/adapters/lockfreequeues_mupsic_adapter.nim[4-52]
- benchmarks/nim/adapters/lockfreequeues_sipmuc_adapter.nim[7-65]
- benchmarks/nim/bench_throughput.nim[174-206]

### What to change (pick one consistent design)
Option A (recommended for harness integration):
- Export the underlying queue field (e.g., `queue*`) and/or add exported accessors:
 - `proc getProducer*(a: var Adapter; idx: int): Producer[...]`
 - `proc getConsumer*(a: var Adapter; idx: int): Consumer[...]`
- Update multi-thread harness code to call these per-thread inside thread bodies.

Option B (adapter owns registration):
- Redesign `push/pop` so each thread has its own adapter instance containing its own producer/consumer (no sharing).

In either option:
- Keep the current single-thread `producer0/consumer0` fast-path if needed for smoke tests, but make multi-thread usage unambiguous and supported by the exported API.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


4. Non-finite BMF emission 🐞 Bug ≡ Correctness
Description
bench_throughput emits BMF measures from mean/stddev without checking finiteness, but its
throughput calculations still divide by elapsedNs without guarding against elapsedNs <= 0. If
any run yields inf/NaN, merge_bmf.py will exit 1 due to schema validation and the CI bench
workflow will fail.
Code

benchmarks/nim/bench_throughput.nim[R553-566]

+  # BMF accumulator. Always allocate (cheap) so the per-variant
+  # `addMeasure` calls do not need to be guarded; emit only at end if
+  # `--bmf-out` was given.
+  var emitter = initBMFEmitter()
+
+  proc emitThroughputSlug(
+      em: var BMFEmitter, slug: string, mean, stddev: float
+  ) =
+    ## Map legacy `results.ThroughputMetrics{mean, stddev}` onto the BMF
+    ## triple `{value, lower_value, upper_value}` per design 2.3:
+    ## value=mean, bounds = mean +/- 1*stddev (1-sigma band; matches the
+    ## convention used by `bmf_adapter.py` which this code replaces).
+    em.addMeasure(slug, "throughput_ops_ms", mean, mean - stddev, mean + stddev)
+
Evidence
The new BMF emission path unconditionally calls addMeasure(...) with the computed mean and
bounds. The legacy throughput benchmark code computes ops/ms via messageCount / elapsedNs with no
elapsedNs <= 0 guard, so a zero-duration measurement can produce non-finite values. The new
merge_bmf.py step explicitly rejects any non-finite float in value/lower_value/upper_value,
while the new shared harness (bench_common) includes an elapsedNs <= 0 guard—highlighting the
missing protection in bench_throughput.

benchmarks/nim/bench_throughput.nim[553-566]
benchmarks/nim/bench_throughput.nim[139-140]
benchmarks/merge_bmf.py[51-80]
benchmarks/nim/bench_common.nim[548-560]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`bench_throughput` now emits BMF directly, but it does not guard against non-finite numbers. If a run produces `elapsedNs <= 0` (or any other source of non-finite mean/stddev), the emitted JSON will contain non-finite values and `merge_bmf.py` will fail CI.
### Issue Context
- `merge_bmf.py` enforces `math.isfinite` for all numeric fields.
- `bench_common.runOneThroughputRun` already contains an `elapsedNs <= 0` guard, but the legacy `bench_throughput` benchmark functions do not.
### Fix Focus Areas
- benchmarks/nim/bench_throughput.nim[553-566]
- benchmarks/nim/bench_throughput.nim[139-140]
- benchmarks/merge_bmf.py[51-80]
### Recommended fix
Do one (or both) of:
1) Add `elapsedNs <= 0.0` guards in the legacy run procs (`runThroughputBenchmark`, `runMupmucBenchmark`, `runUnboundedMupsicBenchmark`) returning `0.0` (or skipping the sample) to prevent `inf`.
2) In `emitThroughputSlug` (or `addMeasure`), check `mean` and bounds are finite before recording; if `mean` is non-finite, skip emitting that `(slug, measure)` entirely so the rest of the run still uploads successfully.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Previous review results

Review updated until commit b330905

Results up to commit N/A


🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0)


Action required
1. Premature DebraManager free🐞 Bug ☼ Reliability
Description
cleanup() in the new unbounded adapters deallocates the DebraManager while the inline queue
still retains a pointer to it; when the queue is later destroyed it will call
unbindClient(self.manager[]) on freed memory. This can trigger destructor assertions or memory
corruption/UAF during adapter scope exit.
Code

benchmarks/nim/adapters/lockfreequeues_unbounded_mupmuc_adapter.nim[R43-51]

+proc cleanup*[S: static int, T; MaxThreads: static int](
+    a: var LockfreequeuesUnboundedMupmucAdapter[S, T, MaxThreads]
+) =
+  ## Free manager only; queue is inline and its `=destroy` runs when
+  ## the adapter object goes out of scope.
+  if a.manager != nil:
+    reset(a.manager[])
+    dealloc(a.manager)
+    a.manager = nil
Evidence
Both unbounded adapters free a.manager in cleanup() but do not destroy/reset the inline queue
first. The underlying UnboundedMupmuc/UnboundedSipmuc types store the manager pointer inside the
queue object (manager: ptr DebraManager[...]), bind a client refcount on construction
(bindClient(manager[])), and later unbind during =destroy via unbindClient(self.manager[]), so
the manager must outlive the queue instance.

benchmarks/nim/adapters/lockfreequeues_unbounded_mupmuc_adapter.nim[43-51]
benchmarks/nim/adapters/lockfreequeues_unbounded_sipmuc_adapter.nim[43-55]
src/lockfreequeues/unbounded_mupmuc.nim[72-89]
src/lockfreequeues/unbounded_mupmuc.nim[144-146]
src/lockfreequeues/unbounded_mupmuc.nim[436-465]
src/lockfreequeues/unbounded_sipmuc.nim[75-92]
src/lockfreequeues/unbounded_sipmuc.nim[140-142]
src/lockfreequeues/unbounded_sipmuc.nim[376-405]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The unbounded adapters' `cleanup()` frees the `DebraManager` while the inline `queue` still holds a pointer to it. The queue destructor later unbinds from `self.manager[]`, which will be freed, causing assertions or UAF.
### Issue Context
`UnboundedMupmuc`/`UnboundedSipmuc` call `bindClient(manager[])` in their constructors and `unbindClient(self.manager[])` in `=destroy`, so the manager must remain valid until after the queue is destroyed/reset.
### Fix Focus Areas
- benchmarks/nim/adapters/lockfreequeues_unbounded_mupmuc_adapter.nim[43-51]
- benchmarks/nim/adapters/lockfreequeues_unbounded_sipmuc_adapter.nim[43-55]
### Recommended fix
Update `cleanup()` to destroy/reset the inline queue *before* resetting/deallocating the manager, e.g.:
- call `reset(a.queue)` (or otherwise invoke the queue's destructor) while `a.manager` is still valid
- then `reset(a.manager[])` and `dealloc(a.manager)`
Alternative (often simpler): store the manager inline in the adapter and pass `addr a.manager` into `newUnbounded*`, so normal field-destruction order naturally keeps the manager alive until after the queue is destroyed.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended
2. Non-finite BMF emission 🐞 Bug ≡ Correctness
Description
bench_throughput emits BMF measures from mean/stddev without checking finiteness, but its
throughput calculations still divide by elapsedNs without guarding against elapsedNs <= 0. If
any run yields inf/NaN, merge_bmf.py will exit 1 due to schema validation and the CI bench
workflow will fail.
Code

benchmarks/nim/bench_throughput.nim[R553-566]

+  # BMF accumulator. Always allocate (cheap) so the per-variant
+  # `addMeasure` calls do not need to be guarded; emit only at end if
+  # `--bmf-out` was given.
+  var emitter = initBMFEmitter()
+
+  proc emitThroughputSlug(
+      em: var BMFEmitter, slug: string, mean, stddev: float
+  ) =
+    ## Map legacy `results.ThroughputMetrics{mean, stddev}` onto the BMF
+    ## triple `{value, lower_value, upper_value}` per design 2.3:
+    ## value=mean, bounds = mean +/- 1*stddev (1-sigma band; matches the
+    ## convention used by `bmf_adapter.py` which this code replaces).
+    em.addMeasure(slug, "throughput_ops_ms", mean, mean - stddev, mean + stddev)
+
Evidence
The new BMF emission path unconditionally calls addMeasure(...) with the computed mean and
bounds. The legacy throughput benchmark code computes ops/ms via messageCount / elapsedNs with no
elapsedNs <= 0 guard, so a zero-duration measurement can produce non-finite values. The new
merge_bmf.py step explicitly rejects any non-finite float in value/lower_value/upper_value,
while the new shared harness (bench_common) includes an elapsedNs <= 0 guard—highlighting the
missing protection in bench_throughput.

benchmarks/nim/bench_throughput.nim[553-566]
benchmarks/nim/bench_throughput.nim[139-140]
benchmarks/merge_bmf.py[51-80]
benchmarks/nim/bench_common.nim[548-560]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`bench_throughput` now emits BMF directly, but it does not guard against non-finite numbers. If a run produces `elapsedNs <= 0` (or any other source of non-finite mean/stddev), the emitted JSON will contain non-finite values and `merge_bmf.py` will fail CI.
### Issue Context
- `merge_bmf.py` enforces `math.isfinite` for all numeric fields.
- `bench_common.runOneThroughputRun` already contains an `elapsedNs <= 0` guard, but the legacy `bench_throughput` benchmark functions do not.
### Fix Focus Areas
- benchmarks/nim/bench_throughput.nim[553-566]
- benchmarks/nim/bench_throughput.nim[139-140]
- benchmarks/merge_bmf.py[51-80]
### Recommended fix
Do one (or both) of:
1) Add `elapsedNs <= 0.0` guards in the legacy run procs (`runThroughputBenchmark`, `runMupmucBenchmark`, `runUnboundedMupsicBenchmark`) returning `0.0` (or skipping the sample) to prevent `inf`.
2) In `emitThroughputSlug` (or `addMeasure`), check `mean` and bounds are finite before recording; if `mean` is non-finite, skip emitting that `(slug, measure)` entirely so the rest of the run still uploads successfully.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


ⓘ You are approaching your monthly quota for Qodo. Upgrade your plan

Qodo Logo

elijahr added 15 commits May 4, 2026 22:51
Replaces the PR 5 placeholder with the real uPlot block per design
§4.5 schema. Documents source, version (1.6.27), MIT license,
vendored path, and a precise upgrade procedure (jsdelivr URL,
SHA-256 verification via jsdelivr's package metadata).

`.gitattributes` gains two rules per design §4.6:
- `docs/assets/uplot-*.js` — linguist-vendored + linguist-generated.
  Confirmed via `git check-attr -a` against the vendored bundle.
- `docs/assets/bench-results/*.json` — linguist-generated only.
  CI-emitted snapshots are not third-party imports, so the vendored
  flag would mislead.
Replaces the auto-rendered four-row table with a hand-curated
four-row summary covering the four bounded lockfreequeues variants
at one representative shape each, followed by a link to the live
chart page (design §4.4 / Task 5.8). Initial cells contain
`_to be filled at next release_` placeholders; the next release PR
fills them in per the "Updating the README summary" procedure
documented in benchmarks/README.md.

Why hand-curated:
- The chart page already absorbs run-to-run noise; the README does
  not need to.
- Four cells, refreshed once per release cycle, is faster to update
  by hand than to re-introduce an auto-renderer the chart page has
  superseded.
- The README is the first artefact consumers see; a noisy CI run
  should not silently regress its headline numbers.

Pre-deletion release-tag check (per impl plan 5.8):
- v3.2.0 and v4.0.0 each ship `benchmarks/render_readme.nim` in
  their tagged tree. Deleting the file on devel does NOT mutate
  those tags (git tags point at frozen commits).
- No CI workflow, nimble task, or test runner references the
  renderer; the test suite (`tests/test.nim`) does not import
  `t_render_readme`. Confirmed via `git grep` against `*.yml`,
  `*.nimble`, `*.cfg`, and `tests/test.nim`.

Files removed:
- `benchmarks/render_readme.nim` — auto-rendered the README markers.
- `tests/t_render_readme.nim` — `include`s the renderer; would not
  compile after deletion.

`benchmarks/README.md` gains an "Updating the README summary"
section codifying the hand-curation procedure (which shapes to
read, where to read them from, when to commit). The chart-contract
test runner is added to the test commands list.

Two remaining `render_readme` mentions are intentional:
- `CHANGELOG.md` PR-0 entry (historical record of the PR-0 rewrite).
- `benchmarks/README.md` explanatory note about the deletion path.
Adds Track 5 entries to [Unreleased]: chart wiring + assets, BMF
snapshot publishing pipeline with three-layer loop-prevention,
devel-triggered docs deploy + post-deploy asset verification,
THIRD_PARTY_LICENSES + .gitattributes uPlot entries, the new
chart-contract Python test, and the README BENCHMARKS markers
switching from auto-rendered to hand-curated. Documents removal
of `benchmarks/render_readme.nim` and its test, with the
pre-deletion release-tag check noted.

Manual chart performance evidence (LCP/FCP screenshot per design
§3 PR 5 acceptance) will be attached to the PR description after
push, since it requires a live mike deploy URL.
…y old uPlot

Per Gemini review on PR #24.

1. `docs/benchmarks.md` and `docs/assets/bench-charts.js`: `./assets/`
   resolves to `/<version>/benchmarks/assets/` under mkdocs
   `use_directory_urls` (default true), which 404s — assets actually
   live at `/<version>/assets/`. Switch to `../assets/` so the chart
   page reaches them via one level up.

2. `.github/workflows/docs.yml` mike-asset verifier: same path bug —
   was probing `/dev/benchmarks/assets/bench-results/latest.json` which
   would have returned 404 once the snapshot landed (the verify step
   only runs on devel push, so the bug wouldn't surface until then).
   Aligned with the page's `../assets/` path.

3. `bench-charts.js` ResizeObserver leak on log-scale toggle:
   `attachResizeObserver` was called from inside `rebuild()` which
   fires on every toggle. Each click left a stranded observer + a
   stale `plot` reference its setSize closure was still firing on.
   Refactored to a single observer attached after the first build,
   which resolves `plot` lazily through a getter — toggle just swaps
   the underlying instance.

4. `bench-charts.js` orphaned uPlot DOM/listeners: `rebuild()` mounted
   a new uPlot without destroying the previous one. Added
   `if (plot) plot.destroy()` at the top of `rebuild()`. Combined with
   the observer fix above, the toggle is now state-clean across
   arbitrarily many flips.

Syntax-checked bench-charts.js with `node --check`.
Three code-review fixes on PR 5:

1. Snapshot push race. The bench-upload Snapshot step pushed devel
   without rebasing, so a concurrent push during a long bench run
   would reject non-fast-forward and fail the entire job. Add a
   fetch + rebase + retry loop (3 attempts) and push HEAD:devel.
   The commit is already [skip ci], so the retry cannot trigger an
   infinite bench loop on a successful push.

2. Hardcoded GitHub Pages verify URL. The docs.yml verification
   step curled https://elijahr.github.io/lockfreequeues/... which
   is wrong on forks or after a repo rename. Derive the URL from
   github.repository_owner and github.event.repository.name so the
   verifier follows the repo wherever it lives.

3. Log-scale accepts non-positive. The chart defaults to log Y
   (distr: 3) but merge_bmf.py only enforces finiteness, so a
   malformed BMF could deliver a 0 or negative throughput value.
   Treat <= 0 as missing data in toUplotData. Throughput in ops/ms
   cannot legitimately be non-positive (a 0 indicates a degenerate
   run that bench_throughput's elapsedNs guard already rejects), so
   surfacing as missing is the correct semantics.
The PR4 -> PR5 rebase merged two permissions blocks: PR4's
contents:read (needed by actions/checkout + actions/download-artifact
when default GITHUB_TOKEN is read-only) and PR5's contents:write
(needed by the snapshot-push step). YAML rejects duplicate keys, so
GitHub Actions failed the workflow at parse time before any job
started — visible as a workflow-level failure with no job rows.

Collapse to contents:write, which is a superset of read.
- Vendor `uPlot.min.css` 1.6.27 alongside the existing IIFE JS bundle.
  Without the companion stylesheet, uPlot's axes, grid lines, and
  cursor are misaligned (uPlot ships the layout rules in CSS, not
  inline). Wire it via a `<link>` in benchmarks.md AND add it to
  mkdocs `extra_css` so it loads on every page (cheap, ~2KB) and the
  cursor/grid layout works regardless of which page renders the chart.
- Update .gitattributes vendoring rules to cover the .css file too.
- Update THIRD_PARTY_LICENSES upgrade procedure for the dual bundle.

- bench-charts.js: replace per-comparison regex+parseInt with a
  pre-parse-and-sort pass that computes (p, c, total) once per label.
  Non-matching labels keep a sentinel that puts them ahead of matched
  shapes, ordered lexicographically among themselves.

- test_bench_charts_contract: replace the manual `v == v and v not in
  (inf, -inf)` finite check with `math.isfinite(v)`.
…link

- Series key is now `<library> (<topology>)` instead of bare library
  name. A library that exposes multiple topologies (e.g. mupmuc/spsc
  AND mupmuc/mpmc) used to merge into one series and silently
  overwrite shape values.

- Cache `_shapeMap` on each library's first hover so the setCursor
  hook does an O(1) Map lookup per visible series instead of an O(N)
  linear `find` on every mousemove.

- Drop the redundant `<link rel="stylesheet">` for uplot's CSS in
  docs/benchmarks.md — it's already loaded globally via mkdocs.yml's
  `extra_css`. The duplicate caused two network requests and risked
  load-order conflicts.

- Document the contract-test version-string bump in the uPlot upgrade
  procedure, and clarify that the CSS comes via `extra_css` (not a
  `<link>`) so future upgraders touch the right config.

The duplicate-network-request risk and the silent-overwrite bug were
the user-facing failures; the Map cache is a small ergonomic win on
the same code path.
…JS parseSlug

- bench.yml: remove [skip ci] from the snapshot commit. The marker
  also suppressed docs.yml, which has no paths-ignore and must run
  on snapshot pushes to deploy fresh chart data to Pages. Loop
  prevention is unaffected: bench.yml's existing paths-ignore on
  docs/assets/bench-results/** already short-circuits the bench job
  on the snapshot commit, and the bot-actor guard on bench-upload is
  the defense-in-depth layer. Comment block updated to reflect the
  two-layer scheme (was three layers).
- test_bench_charts_contract: SLUG_RE accepted only 3-segment slugs;
  bench-charts.js parseSlug accepts 3+ via slice(1, -1).join('/').
  Loosen the python regex to match so the contract test does not
  reject nested-topology slugs the JS chart would render.
Track 6 (PR 6, final track in the bench-rollup feature stack) adds
latency-regression gating in Bencher and the supporting BMF measures.

bench_latency.nim now emits the full p50/p95/p99/p999/max latency tuple
per bounded variant slug. Each call to runLatencyHarness adds two new
measures alongside the existing trio:

  em.addMeasure(slug, "latency_p999_ns", metrics.p999_ns)
  em.addMeasure(slug, "latency_max_ns",  metrics.max_ns)

The merged BMF on each bench job thus carries a complete tail-latency
profile per (library, topology) cell, available to the Bencher dashboard
and any downstream comparison chart.

bench_common.nim raises HistogramTopK from 1000 to 5000. At production
sample counts (BenchLatencyMessageCount=100_000 x BenchLatencyRuns=33,
seenAll ~ 3.3M per aggregated histogram) the p999 tail rank is ~3300;
K=1000 forced the p999 lookup into the rescaled-reservoir stratum,
while K=5000 keeps it in the exact top-K stratum with headroom for
stochastic boundary variation. Memory cost is 5000 x 8B = 40KB
additional per histogram, negligible relative to the 99K-sample
reservoir.

bench.yml's base-branch tracking step (push to devel/main) configures
per-measure thresholds in a single bencher run invocation: latency_p99_ns
with --threshold-upper-boundary 0.99 (regression = latency increase) and
throughput_ops_ms with --threshold-lower-boundary 0.99 (regression =
throughput drop). Both share --threshold-test t_test
--threshold-max-sample-size 64, terminated by --thresholds-reset so only
the explicitly-listed thresholds remain active.

This also corrects a prior measure-name mismatch: the earlier
--threshold-measure throughput never matched any emitted measure (the
actual key is throughput_ops_ms), so the previous throughput threshold
was a no-op.

Threshold activation requires >= 10 prior runs accumulated in Bencher
to calibrate the t-test baseline (Task 6.4 stability soak gate).
Bencher does not emit alerts on measures with insufficient sample
history, so the configuration is dormant until the soak completes
post-merge.

Tests:
- t_bench_common.nim: HistogramTopK >= 5000 assertion + p999 within 5%
  of sort fallback on 3.3M log-normal samples (27 tests, all passing).
- t_bench_latency.nim: every bounded variant slug carries
  latency_p999_ns + latency_max_ns with monotone tail ordering p99 <=
  p999 <= max (7 tests, all passing).
- tests/test.nim main suite unaffected (200 tests, all passing).

Refs design doc /Users/eek/.local/spellbook/docs/Users-eek-Development-lockfreequeues/plans/2026-05-01-bench-rollup-design.md
section 3 PR 6 and impl plan
/Users/eek/.local/spellbook/docs/Users-eek-Development-lockfreequeues/plans/2026-05-01-bench-rollup-impl.md
Track 6 (Tasks 6.1-6.5).
…ntiles

Per Gemini review on PR #25.

The Track 6 Task 6.2 commentary in HistogramTopK, bench_latency.nim,
t_bench_common.nim, and CHANGELOG repeatedly claimed that
runLatencyHarness aggregates samples across runs ("seenAll ≈ 3.3M
per aggregated histogram"). It does not — design 2.5 explicitly
says "mean across runs of each percentile", so each Histogram only
ever sees BenchLatencyMessageCount samples (default 100K), not
MessageCount × Runs.

At the default 100K MessageCount the previous K=1000 was already
adequate — TopK + Reservoir already captured every sample exactly
(1000 + 99,000 ≥ 100,000), so K=1000 was never spilling p999 into
the rescaled stratum at default settings.

Reframed all four sites: the K=5000 sizing is **anticipatory** for
operators who override BenchLatencyMessageCount upward (~5M-sample
overrides), keeping p999 (tail rank = MessageCount × 0.001) inside
the exact top-K stratum at that volume. The 3.3M-sample stress test
in t_bench_common.nim is reframed as a stress shape that exercises
the K=5000 design choice, not a depiction of production volume.

No code logic changed; comments + CHANGELOG only.
Code review on PR 6 noted that the new HistogramTopK assertion in
tests/t_bench_common.nim and the latency CLI assertions in
tests/t_bench_latency.nim were not imported by tests/test.nim and not
run by any CI step, so the validation added by this PR was not actually
enforced in CI.

Wire up via nimble's separation-of-concerns pattern (Option B from the
review):

- lockfreequeues.nimble gains a 'benchtests' task that compiles and
  runs t_bench_common, t_bench_latency, and t_bench_adapters under one
  Nim invocation each. The bench harness tests live outside srcDir;
  importing them from tests/test.nim would force every entry of the
  8-invocation MM/sanitizer test matrix to compile bench_common's
  threading machinery. A dedicated task keeps the regular nimble test
  matrix lean while still enforcing the bench-side assertions.

- A second 'benchteststress' task enables the gated 3.3M-sample
  HistogramTopK p999 stress shape (slow, ~10-15s release-mode), so
  default benchtests stays under a second.

- bench.yml gains a 'bench-tests' job (ubuntu-latest) that runs
  'nimble benchtests' once per workflow trigger, with the same
  fork-PR / bot-actor guards as the bench matrix jobs.

The 3.3M-sample stress test in t_bench_common is now gated behind
-d:BenchCommonStress so it only runs under benchteststress, not the
default benchtests run.
The previous wording called `percentile(1.0)` "the top-K head", which
is misleading for a min-heap implementation: `topKHeap[0]` is the
SMALLEST element in the top-K (the admission threshold for new
outliers), not the largest. The implementation snapshots the heap
into a sequence, sorts it, and returns `topk[^1]` — the maximum.

Comment-only change; behaviour unchanged.
Replaces the Qodo PR-Agent reusable workflow with Momus
(axiomantic/momus), an open-source from-scratch PR review bot.
…KEY -> LLM_API_KEY, pass MOMUS_APP_* secrets, set trigger_mention
Copy link
Copy Markdown

@axiomantic-momus axiomantic-momus Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR 0 of the bench-rollup feature: introduces a shared bench harness module (bench_common.nim), rewires the CI bench workflow to use native BMF JSON emission via --bmf-out, adds 5 new lockfreequeues adapters, renames 3 existing adapters, introduces merge_bmf.py for stateless BMF union, and rewrites render_readme.nim for the new BMF shape. Two substantive issues found: the per-variant warmup defaults silently dropped from 3 to 0 for bounded variants (contradicts the 'pure refactor' claim and affects benchmark numbers), and 14 new tests across t_render_readme.nim and test_merge_bmf.py are not wired into any CI workflow.

Severity tally: 2 Mediums, 1 Low.

Medium (blocking)

  • BOT-A1 (benchmarks/nim/bench_throughput.nim:521): Per-variant warmup defaults silently dropped from 3 to 0 for bounded variants
  • BOT-A2 (lockfreequeues.nimble:65): 14 new tests not wired into CI; merge_bmf and render_readme tests provide no gating

Low

  • BOT-A3 (tests/t_bench_common.nim:256): SmokeAdapter leaks its Channel in throughput and latency smoke tests

Noteworthy

  • The BMFEmitter + Histogram implementation is careful about edge cases (NaN/Inf guard in addMeasure, zero-duration guard in elapsed time calculations, deterministic reservoir seed for reproducibility).
  • The merge_bmf.py collision detection and schema validation (slug regex, measure regex, NaN rejection, bool rejection) is thorough and well-tested.
  • The CI bench.yml now pins sibling deps to tagged releases (v0.6.0, v0.7.0) instead of main, improving CI determinism.

Verdict: REQUEST_CHANGES.

Commands
  • Comment /ai-review or mention @axiomantic-momus[bot] to request a re-review of the latest changes.
  • Reply to a finding with won't fix, by design, or not a bug to decline it.
  • Reply with instead, ... to propose an alternative fix.

Powered by Momus running deepseek/deepseek-v4-pro via openrouter.ai.

Comment thread benchmarks/nim/bench_throughput.nim Outdated
Comment on lines +521 to +525
BenchSipsicWarmup {.intdefine.} = 0
BenchMupmucRuns {.intdefine.} = 10
BenchMupmucWarmup {.intdefine.} = 0
BenchChannelsRuns {.intdefine.} = 10
BenchChannelsWarmup {.intdefine.} = 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOT-A1 — Medium (bug)
Per-variant warmup defaults silently dropped from 3 to 0 for bounded variants
The new per-variant compile-time constants BenchSipsicWarmup, BenchMupmucWarmup, and BenchChannelsWarmup all default to 0 (lines 521-525). The old code called benchmarkThroughput/benchmarkMupmucNP*C without an explicit warmup argument, picking up the default WarmupRuns=3 (line 44). In CI the old behavior was warmup=2 (via -d:WarmupRuns=2). The PR description states 'This is a pure refactor PR. No new variants are benched, no regression vs the prior CI numbers is expected' and claims the binary is 'bit-for-bit unchanged from the prior release' when --bmf-out is absent. The warmup change contradicts both claims: the first timed runs will absorb cold-cache/JIT effects, producing systematically lower throughput numbers across all bounded variants (sipsic, mupmuc, channels) on every local and CI invocation. The unbounded_mupsic variants are unaffected — they still use WarmupRuns=3 (line 465).

Suggested change
BenchSipsicWarmup {.intdefine.} = 0
BenchMupmucRuns {.intdefine.} = 10
BenchMupmucWarmup {.intdefine.} = 0
BenchChannelsRuns {.intdefine.} = 10
BenchChannelsWarmup {.intdefine.} = 0
Either set the per-variant warmup defaults to match the old behavior (BenchSipsicWarmup=3, BenchMupmucWarmup=3, BenchChannelsWarmup=3), or if the change is intentional, update the PR description and CHANGELOG to document it as a Changed behavior. Also ensure the CI bench.yml -d:WarmupRuns flag actually reaches these variants (currently it doesn't, since the isMainModule calls pass the per-variant constants explicitly).

Comment thread lockfreequeues.nimble
task benchtests, "Runs the bench harness test suite":
# The bench harness lives outside `srcDir`, so its dedicated tests
# (`tests/t_bench_*.nim`) are NOT imported by `tests/test.nim` to
# keep the regular `nimble test` matrix free of the bench harness's
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOT-A2 — Medium (tests)
14 new tests not wired into CI; merge_bmf and render_readme tests provide no gating
The PR adds 4 tests in tests/t_render_readme.nim and 10 tests in benchmarks/tests/test_merge_bmf.py, but neither file is referenced in any CI workflow or nimble task. The benchtests nimble task (line 65) only runs tests/t_bench_common.nim. The bench.yml bench-tests job only invokes nimble benchtests. The Python tests require python3 -m unittest benchmarks.tests.test_merge_bmf -v and the Nim tests require a separate compile+run of t_render_readme.nim — neither is automated. These tests cover schema validation, collision detection, slug parsing, BMF shape parsing, and edge cases (NaN rejection, missing files). Without CI gating, regressions in merge_bmf.py or render_readme.nim won't be caught pre-merge.

Suggested change
# keep the regular `nimble test` matrix free of the bench harness's
Add a steps to the bench.yml `bench-tests` job to run `python3 -m unittest benchmarks.tests.test_merge_bmf -v` and to compile+run `tests/t_render_readme.nim`. Alternatively, extend the `benchtests` nimble task to include these invocations.

Comment thread tests/t_bench_common.nim
Comment on lines +256 to +272
queueInit = proc(cap: int): SmokeAdapter = initSmokeAdapter(cap),
capacity = 1024,
numProducers = 1,
numConsumers = 1,
messageCount = 1000,
runCount = 1,
warmupCount = 0,
)
check metrics.runs == 1
check metrics.ops_ms_mean > 0.0

# ---------- Task 0.6: runLatencyHarness smoke ----------

suite "bench_common runLatencyHarness":
test "smoke: 1P/1C, 1000 messages, 1 run, 0 warmup; p50<p99<max":
let metrics = runLatencyHarness[SmokeAdapter](
queueInit = proc(): SmokeAdapter = initSmokeAdapter(1024),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOT-A3 — Low (quality)
SmokeAdapter leaks its Channel in throughput and latency smoke tests
The SmokeAdapter type (lines 256-272) allocates a Channel[uint64] via create in initSmokeAdapter but defines no cleanup proc. The bench_common harness (both runOneThroughputRun and runOneLatencyRun) uses when compiles(cleanup(queue)) to conditionally call cleanup, which never fires for SmokeAdapter. Every invocation of the runThroughputHarness and runLatencyHarness smoke tests leaks a Channel. While tests are short-lived and the OS reclaims the memory at process exit, this is a leak that could mask real issues if these smoke adapters are copied into production adapter code.

Suggested change
queueInit = proc(cap: int): SmokeAdapter = initSmokeAdapter(cap),
capacity = 1024,
numProducers = 1,
numConsumers = 1,
messageCount = 1000,
runCount = 1,
warmupCount = 0,
)
check metrics.runs == 1
check metrics.ops_ms_mean > 0.0
# ---------- Task 0.6: runLatencyHarness smoke ----------
suite "bench_common runLatencyHarness":
test "smoke: 1P/1C, 1000 messages, 1 run, 0 warmup; p50<p99<max":
let metrics = runLatencyHarness[SmokeAdapter](
queueInit = proc(): SmokeAdapter = initSmokeAdapter(1024),
Add a `cleanup` proc to SmokeAdapter that closes and deallocates the channel, matching the pattern in channels_adapter.nim.

@elijahr elijahr changed the title bench-rollup PR 0: shared bench harness + native BMF emission bench-rollup: shared harness + comparison adapters + interactive charts May 5, 2026
@elijahr
Copy link
Copy Markdown
Owner Author

elijahr commented May 5, 2026

/ai-review

Copy link
Copy Markdown

@axiomantic-momus axiomantic-momus Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This omnibus PR consolidates 7 stacked PRs for the bench-rollup feature stack (81 commits, 70 files, +11,296/-1,493). It adds a shared bench harness with native BMF emission, topology-split binaries, comparison adapters for 7 external queue libraries, an interactive uPlot chart, CI matrix with per-step timeouts, cache-line-aligned segment allocation, and Bencher threshold gating. The code is thorough, well-tested, and the CI pipeline is carefully designed with soft-skip patterns and loop-prevention guards. Only one minor nit found.

Severity tally: 1 Nit.

Nit

  • BOT-B1 (tests/t_aligned_alloc.nim:10): Test file uses raw c_free instead of freeAligned from the module under test.

Noteworthy

  • Cache-line-aligned segment allocation (allocAligned/freeAligned with POSIX posix_memalign and Windows _aligned_malloc paths) is a genuine fix for false sharing in unbounded queues, verified by 8 dedicated assertions in t_unbounded_padding.nim.
  • The soft-skip CI pattern (install → smoke → set-flag-on-success → annotate-on-failure) gracefully handles unavailable external libraries without failing the workflow, keeping partial bencher coverage for surviving binaries.
  • Deletion-safety guard (superset_check.py) run in CI between merge and upload ensures the topology split never silently drops slugs — a robust regression prevention pattern.

Verdict: APPROVE.

Commands
  • Comment /ai-review or mention @axiomantic-momus[bot] to request a re-review of the latest changes.
  • Reply to a finding with won't fix, by design, or not a bug to decline it.
  • Reply with instead, ... to propose an alternative fix.

Powered by Momus running deepseek/deepseek-v4-pro via openrouter.ai.

Comment thread tests/t_aligned_alloc.nim Outdated
Comment on lines +10 to +60
from system/ansi_c import c_free

type
Tiny = object
a: int

Line = object
a: int
b: int
c: int
d: int
e: int
f: int
g: int
h: int

Big = object
a: array[256, byte]

suite "internal/aligned_alloc.allocAligned":
test "tiny payload (sizeof < CacheLineBytes) is 64B-aligned":
let p = allocAligned[Tiny]()
check p != nil
check (cast[uint](p) mod CacheLineBytes.uint) == 0
check p.a == 0 # zero-initialized
c_free(p)

test "line-sized payload (sizeof == CacheLineBytes) is 64B-aligned":
let p = allocAligned[Line]()
check p != nil
check (cast[uint](p) mod CacheLineBytes.uint) == 0
check p.a == 0
check p.h == 0
c_free(p)

test "big payload (sizeof > CacheLineBytes) is 64B-aligned":
let p = allocAligned[Big]()
check p != nil
check (cast[uint](p) mod CacheLineBytes.uint) == 0
for i in 0 ..< 256:
check p.a[i] == 0.byte
c_free(p)

test "many allocations are all 64B-aligned":
var ptrs: array[64, ptr Line]
for i in 0 ..< 64:
ptrs[i] = allocAligned[Line]()
check ptrs[i] != nil
check (cast[uint](ptrs[i]) mod CacheLineBytes.uint) == 0
for i in 0 ..< 64:
c_free(ptrs[i])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOT-B1 — Nit (tests)
Test file uses raw c_free instead of freeAligned from the module under test.
tests/t_aligned_alloc.nim imports lockfreequeues/internal/aligned_alloc (which exports freeAligned), but deallocates test allocations via system/ansi_c.c_free instead of using the module's own freeAligned. On POSIX (the only CI platform) freeAligned delegates to c_free, so tests pass, but the freeAligned code path added in this PR is left untested. On Windows the test would be incorrect (_aligned_free is required).

Suggested change
from system/ansi_c import c_free
type
Tiny = object
a: int
Line = object
a: int
b: int
c: int
d: int
e: int
f: int
g: int
h: int
Big = object
a: array[256, byte]
suite "internal/aligned_alloc.allocAligned":
test "tiny payload (sizeof < CacheLineBytes) is 64B-aligned":
let p = allocAligned[Tiny]()
check p != nil
check (cast[uint](p) mod CacheLineBytes.uint) == 0
check p.a == 0 # zero-initialized
c_free(p)
test "line-sized payload (sizeof == CacheLineBytes) is 64B-aligned":
let p = allocAligned[Line]()
check p != nil
check (cast[uint](p) mod CacheLineBytes.uint) == 0
check p.a == 0
check p.h == 0
c_free(p)
test "big payload (sizeof > CacheLineBytes) is 64B-aligned":
let p = allocAligned[Big]()
check p != nil
check (cast[uint](p) mod CacheLineBytes.uint) == 0
for i in 0 ..< 256:
check p.a[i] == 0.byte
c_free(p)
test "many allocations are all 64B-aligned":
var ptrs: array[64, ptr Line]
for i in 0 ..< 64:
ptrs[i] = allocAligned[Line]()
check ptrs[i] != nil
check (cast[uint](ptrs[i]) mod CacheLineBytes.uint) == 0
for i in 0 ..< 64:
c_free(ptrs[i])
Replace all `c_free(p)` / `c_free(ptrs[i])` calls with `freeAligned(p)` / `freeAligned(ptrs[i])` and drop the `from system/ansi_c import c_free` import.

elijahr added 5 commits May 5, 2026 01:57
Drops c_free in favor of the module's freeAligned helper so the test
exercises the public API symmetrically. Same underlying allocator
(c_free under POSIX), but freeAligned is the documented inverse of
allocAligned and the right call when _aligned_free is needed on
Windows.
bench-rollup feature stack ships as 4.2.0. Version bumped from 4.1.0
(already-released; tagged v4.1.0 on 2026-05-01) and the previously
[Unreleased] CHANGELOG section moved under the [4.2.0] header dated
2026-05-05.

release.yml triggers on push to devel: it parses this version, creates
v4.2.0 tag if absent, and publishes the GitHub Release with
auto-generated notes from commit history.
Conflict resolutions:
- src/lockfreequeues/unbounded_{mupmuc,mupsic,sipmuc}.nim: kept HEAD's
  allocAligned[DebraManager[MaxThreads]]() (raises OutOfMemDefect
  internally per aligned_alloc.nim) and dropped devel's c_calloc + nil
  check. The cache-line padding work (PR 19) supersedes c_calloc here.
- mkdocs.yml: took devel's 'Concepts: Safety Model / Slot Ownership
  Typestates' nav block. devel deleted docs/api/ops.md, so HEAD's
  'Support: Operations' entry would break the docs build.
- CHANGELOG.md: combined both Unreleased sections. Promoted devel's
  CASAttempt typestate restructure to a top-level BREAKING entry;
  appended devel's Changed/Added/Fixed under '(typestates 0.7 uplift)'
  subheaders so the bench-rollup and typestates-uplift sections are
  visually separable in the next release notes.

Verified: 210/210 tests pass under nim c --threads:on tests/test.nim
with typestates 0.7.2 + this merge.
The merge from devel surfaced that this branch's build.yml + bench.yml
still pinned nim-typestates@v0.7.0 / nim-debra@v0.6.0. lockfreequeues
now requires typestates >= 0.7.2 (CASAttempt union, match-macro
generic-context fixes) and debra >= 0.7.0 (no published tag yet);
v0.7.0 fails compilation with 'match arm head must be a single state
identifier' on the new opaque-state typestate match invocations.

Match devel's approach: track --branch main until nimble.directory
picks up the new tags. Updates 3 clone blocks (build.yml + 2 in
bench.yml) plus the now-stale 'pin to release tags' rationale comments.

Verified locally with main of nim-typestates (= v0.7.2 sha db7159f)
and nim-debra: 210/210 tests pass.
Copy link
Copy Markdown

@axiomantic-momus axiomantic-momus Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This omnibus PR consolidates 7 stacked PRs (Tracks 0–6) into a single bench-rollup: shared harness with native BMF emission, topology-split binaries, comparison adapters (Crossbeam, Boost, Loony, MoodyCamel, threading.Channels, system.Channel), interactive uPlot charts, latency thresholds, and snapshot publishing. One confirmed bug found: the bench-comparison.yml workflow carries a stale threshold measure name that was already fixed in bench.yml's Track 6, making the Crossbeam Bencher threshold silently inoperative.

Severity tally: 1 Low.

Low

  • BOT-A1 (.github/workflows/bench-comparison.yml:196): Bencher threshold measure name mismatch in bench-comparison.yml: throughput vs throughput_ops_ms

Noteworthy

  • Well-structured soft-skip adapter pattern: install → smoke → set-flag-on-success → annotate-on-failure, with workflow_dispatch force-skip overrides. Graceful degradation without workflow failures.
  • Three-layer loop-prevention for snapshot publishing (paths-ignore, bot-actor guard, explicit ref checkout) is carefully reasoned and avoids the common [skip ci] / docs-deploy tension.
  • The Histogram top-K + Algorithm R reservoir stratified-percentile implementation in bench_common.nim is mathematically sound with proper rescaling for body-stratum lookup.

Verdict: APPROVE.

Commands
  • Comment /ai-review or mention @axiomantic-momus[bot] to request a re-review of the latest changes.
  • Reply to a finding with won't fix, by design, or not a bug to decline it.
  • Reply with instead, ... to propose an alternative fix.

Powered by Momus running deepseek/deepseek-v4-pro via openrouter.ai.

Comment thread .github/workflows/bench-comparison.yml Outdated
--token '${{ secrets.BENCHER_API_TOKEN }}' \
--branch devel \
--testbed ubuntu-latest \
--threshold-measure throughput \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOT-A1 — Low (bug)
Bencher threshold measure name mismatch in bench-comparison.yml: throughput vs throughput_ops_ms
The "Track scheduled benchmarks with Bencher" step in bench-comparison.yml uses --threshold-measure throughput but the actual measure key emitted by every bench binary (bench_spsc, bench_mpsc, bench_mpmc, bench_unbounded) is throughput_ops_ms. This is the same bug that bench.yml's Track 6 explicitly corrected (the Track 6 comment reads: "the prior config used --threshold-measure throughput; the actual measure key ... is throughput_ops_ms"). The bench-comparison.yml was not updated to match, so the Crossbeam threshold will never match any measure — rendering Bencher regression detection on the nightly crossbeam workflow a silent no-op. Note: this workflow runs on a nightly cron schedule and is not on the PR critical path, so this is flagged as low severity (follow-up fix) rather than blocking.

Suggested change
--threshold-measure throughput \
Replace `--threshold-measure throughput` with `--threshold-measure throughput_ops_ms` on line 196 of bench-comparison.yml, matching the fix already applied in bench.yml line 773.

@elijahr
Copy link
Copy Markdown
Owner Author

elijahr commented May 5, 2026

/ai-review

Copy link
Copy Markdown

@axiomantic-momus axiomantic-momus Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This omnibus PR rolls up 7 tracks of work (81 commits, 70 files) adding a shared bench harness, comparison adapters (Boost, Loony, Crossbeam, MoodyCamel, threading.Chan, system.Channel), topology-split binaries, interactive uPlot charts, and latency p99/p999/p9999 thresholds. Two substantive issues found: a crossbeam threshold measure name mismatch that makes Bencher gating a no-op in the comparison workflow (carried over from a bug the main bench.yml already fixed), and an incorrect rescaling formula in a docstring comment.

Severity tally: 1 Medium, 1 Low.

Medium (blocking)

  • BOT-C1 (.github/workflows/bench-comparison.yml:196): Bencher threshold measure name mismatch in bench-comparison.yml: throughput vs throughput_ops_ms

Low

  • BOT-C2 (benchmarks/nim/bench_common.nim:341): Histogram percentile docstring formula does not match the code

Noteworthy

  • The bench_common.nim shared harness is well-structured with proper cleanup (when compiles(cleanup(...))) and thorough docstrings.
  • The Rust cdylib (bench-ffi-crossbeam) has proper null-pointer guards, panic safety, and comprehensive integration tests.
  • The Loony adapter's documented encoding constraint (v+1)<<3 is well-documented with range limits and debug asserts.

Verdict: REQUEST_CHANGES.

Commands
  • Comment /ai-review or mention @axiomantic-momus[bot] to request a re-review of the latest changes.
  • Reply to a finding with won't fix, by design, or not a bug to decline it.
  • Reply with instead, ... to propose an alternative fix.

Powered by Momus running deepseek/deepseek-v4-pro via openrouter.ai.

Comment thread .github/workflows/bench-comparison.yml Outdated
--token '${{ secrets.BENCHER_API_TOKEN }}' \
--branch devel \
--testbed ubuntu-latest \
--threshold-measure throughput \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOT-C1 — Medium (bug)
Bencher threshold measure name mismatch in bench-comparison.yml: throughput vs throughput_ops_ms
The "Track scheduled benchmarks with Bencher" step in bench-comparison.yml uses --threshold-measure throughput but the actual measure key emitted by every bench binary (bench_spsc, bench_mpsc, bench_mpmc, bench_unbounded) is throughput_ops_ms. This is the same bug that bench.yml's Track 6 explicitly corrected (the Track 6 comment reads: "the earlier --threshold-measure throughput never matched any emitted measure (the actual key is throughput_ops_ms), so the previous throughput threshold was a no-op"). The bench-comparison.yml was missed when bench.yml was fixed. The threshold-gating for crossbeam comparison will never fire, making the scheduled regression detection silently non-functional.

Suggested change
--threshold-measure throughput \
Change `--threshold-measure throughput` to `--threshold-measure throughput_ops_ms` at line 196 of bench-comparison.yml.

Comment on lines +341 to +343
let pBodyClamped = max(0.0, min(1.0, pBody))
return percentile(h.reservoir, pBodyClamped)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOT-C2 — Low (quality)
Histogram percentile docstring formula does not match the code
The Histogram.percentile proc's docstring says the body-stratum rescaling is p_body = (p*seenAll - K) / (seenAll - K) (subtracting K from the numerator). The code does let pBody = (pc * float(h.seenAll)) / float(bodySize) (no K subtraction). The code is correct — the body stratum holds the seenAll - K smallest samples, so the target rank within the body is p * seenAll, not p * seenAll - K. The docstring formula is wrong. The same docstring appears in the Histogram.percentiles proc at line ~371.

Suggested change
let pBodyClamped = max(0.0, min(1.0, pBody))
return percentile(h.reservoir, pBodyClamped)
Update the docstring to match the code: `p_body = (p * seenAll) / (seenAll - K)`.

BOT-C1 (Medium): bench-comparison.yml's scheduled-run threshold step used
`--threshold-measure throughput` but the BMF emitter writes the measure
key as `throughput_ops_ms`. The threshold therefore matched no measure
and was a silent no-op; gating on Crossbeam vs lockfreequeues regressions
never fired. Same bug class as PR 6's main bench.yml fix; aligning the
two workflows on the canonical key.

BOT-C2 (Low): Histogram percentile docstring described the body-stratum
rescale as `(p*seenAll - K) / (seenAll - K)`. The actual code computes
`(p*seenAll) / (seenAll - K)` (without the K subtraction in the
numerator), and the worked examples below the formula already use the
correct ratio (e.g. 50_000/99_000 for p=0.50). Bringing the formula and
the worked examples into agreement with the implementation; tidying a
stale '99_000 - 0' artefact from an earlier draft.
Copy link
Copy Markdown

@axiomantic-momus axiomantic-momus Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This omnibus PR consolidates 7 previously-reviewed PRs (Track 0–6) adding a shared bench harness, 9 comparison adapters, topology-split bench binaries, interactive uPlot charts, and a snapshot publishing pipeline. Overall the code is well-structured with strong defensive patterns. Found 2 non-blocking quality issues: bench-comparison.yml's nightly threshold reset may silently remove bench.yml's latency threshold configuration, and the bench-tests job clones sibling deps from main rather than the release tags advertised in the PR description.

Severity tally: 2 Lows.

Low

  • BOT-A2 (.github/workflows/bench-comparison.yml:189): Nightly bench-comparison bencher run may remove bench.yml's latency threshold.
  • BOT-A3 (.github/workflows/bench.yml:100): bench-tests job clones sibling Nim deps from main, contradicting PR description.

Noteworthy

  • Consistent soft-skip pattern across all comparison adapters with install/smoke/set-flag stages and force_skip workflow_dispatch inputs — well-designed for CI resilience.
  • Cache-line alignment fix for unbounded queue segments via posix_memalign shim is clean and well-documented with platform-conditional fallback for Windows.

Verdict: APPROVE.

Commands
  • Comment /ai-review or mention @axiomantic-momus[bot] to request a re-review of the latest changes.
  • Reply to a finding with won't fix, by design, or not a bug to decline it.
  • Reply with instead, ... to propose an alternative fix.

Powered by Momus running deepseek/deepseek-v4-pro via openrouter.ai.

Comment on lines +189 to +206
if: github.event_name == 'schedule' && env.BENCHER_API_TOKEN != ''
run: |
bencher run \
--project lockfreequeues \
--token '${{ secrets.BENCHER_API_TOKEN }}' \
--branch devel \
--testbed ubuntu-latest \
--threshold-measure throughput_ops_ms \
--threshold-test t_test \
--threshold-max-sample-size 64 \
--threshold-lower-boundary 0.99 \
--thresholds-reset \
--adapter json \
--file merged.json \
--github-actions '${{ secrets.GITHUB_TOKEN }}' \
--err

- name: Track devel-push benchmarks with Bencher
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOT-A2 — Low (quality)
Nightly bench-comparison bencher run may remove bench.yml's latency threshold.
bench-comparison.yml's scheduled (nightly cron) bencher run uploads to the same Bencher project (lockfreequeues) and branch (devel) as bench.yml's push-to-devel run, but it configures only --threshold-measure throughput_ops_ms --thresholds-reset. bench.yml configures BOTH throughput_ops_ms AND latency_p99_ns thresholds. Since bencher stores thresholds per project+branch and --thresholds-reset clears all prior thresholds, the nightly crossbeam run will remove the latency threshold that bench.yml configured on the previous push. The two workflows would fence on threshold configuration: each night, bench-comparison resets to throughput-only; each devel push, bench.yml restores both throughput and latency. During the window between a nightly and the next devel push, latency regression alerting is silently absent.

Suggested change
if: github.event_name == 'schedule' && env.BENCHER_API_TOKEN != ''
run: |
bencher run \
--project lockfreequeues \
--token '${{ secrets.BENCHER_API_TOKEN }}' \
--branch devel \
--testbed ubuntu-latest \
--threshold-measure throughput_ops_ms \
--threshold-test t_test \
--threshold-max-sample-size 64 \
--threshold-lower-boundary 0.99 \
--thresholds-reset \
--adapter json \
--file merged.json \
--github-actions '${{ secrets.GITHUB_TOKEN }}' \
--err
- name: Track devel-push benchmarks with Bencher
Either (a) add `--threshold-measure latency_p99_ns --threshold-test t_test --threshold-max-sample-size 64 --threshold-upper-boundary 0.99` to the bench-comparison nightly invocation so both thresholds survive, or (b) remove `--thresholds-reset` from bench-comparison and add `--threshold-measure throughput_ops_ms --threshold-test t_test --threshold-max-sample-size 64 --threshold-lower-boundary 0.99` so both workflows append rather than reset.

Comment on lines +100 to +111
uses: jiro4989/setup-nim-action@v2
with:
nim-version: 'stable'

- name: Install build deps (Linux)
run: |
sudo apt-get update -q -y
sudo apt-get -qq install -y clang

- name: Clone and install sibling Nim deps (nim-debra, nim-typestates)
run: |
set -e
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOT-A3 — Low (quality)
bench-tests job clones sibling Nim deps from main, contradicting PR description.
The PR description states the new bench-tests CI job is "pinned to v0.6.0 / v0.7.0 sibling deps", but the actual bench.yml code clones nim-debra and nim-typestates from --branch main (not tags). Meanwhile bench-comparison.yml correctly uses --branch v0.6.0 and --branch v0.7.0. Using main means any upstream commit to those repos can break CI non-deterministically. The code comment in the bench-tests job header itself says "Sibling Nim deps are pinned to release tags (not main)" which further contradicts the code's actual --branch main. The benchmark matrix job's parallel clone step has a comment that acknowledges the tradeoff ("Tracks main until nimble.directory picks up the relevant tags"), but the bench-tests job's own comment is misleading and the overall discrepancy between PR description, code comments, and code behavior is confusing for maintainers debugging CI breakage.

Suggested change
uses: jiro4989/setup-nim-action@v2
with:
nim-version: 'stable'
- name: Install build deps (Linux)
run: |
sudo apt-get update -q -y
sudo apt-get -qq install -y clang
- name: Clone and install sibling Nim deps (nim-debra, nim-typestates)
run: |
set -e
Either update the PR description and the bench-tests job's code comment to accurately reflect that the bench-tests job tracks `main`, or align the code with the description by pinning to `v0.6.0` / `v0.7.0` (as bench-comparison.yml already does).

@elijahr elijahr merged commit 8820e1b into devel May 5, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant