Skip to content

v4.2.0 release: bench tightening, SPMC topology, comparison expansion#30

Open
elijahr wants to merge 20 commits intodevelfrom
feat/v4.2.0-bench-tightening
Open

v4.2.0 release: bench tightening, SPMC topology, comparison expansion#30
elijahr wants to merge 20 commits intodevelfrom
feat/v4.2.0-bench-tightening

Conversation

@elijahr
Copy link
Copy Markdown
Owner

@elijahr elijahr commented May 6, 2026

Closes out the v4.2.0 release. 17 commits, 144 files changed
(+13.7k / -948). Full per-area detail lives in CHANGELOG.md
under [4.2.0] - 2026-05-06; this description covers the release
shape and the operator-side work that has to land before merge.

Highlights

Comparison set expansion (Tier 1+2+3). Five new third-party
adapters reach the bench matrix: atomic_queue (header-only, MIT),
rigtorp/SPSCQueue (header-only, MIT), rigtorp/MPMCQueue
(header-only, MIT), flume (Rust, MPL-2.0), kanal (Rust,
MPL-2.0), plus liblfds (C ringbuffer, public-domain) wired through
a thin C wrapper. The Rust crates are consolidated into one cdylib
at benchmarks/rust/comparison/ (libbench_ffi_comparison)
alongside the existing Crossbeam adapters; prefix-per-crate symbol
naming (cb_* / flume_* / kanal_*) prevents collision across
the consolidated FFI surface. C++ headers are vendored under
benchmarks/vendor/ with pinned upstream SHAs and project-authored
upgrade READMEs.

Strict-floor breach (16/17, documented). folly_pcq was DROPPED
during Work Item G after audit: the transitive-include closure is
15 unique folly headers (>6 threshold) and folly main requires
C++20 vs the repo's C++17. Final floor is 16/17. Documented in
CHANGELOG Known Limitations with revisit conditions for v4.3.0+.

First-class SPMC topology axis. Sipmuc (bounded
lockfreequeues_sipmuc and unbounded lockfreequeues_unbounded_sipmuc)
moves off the MPMC panel onto a new dedicated SPMC panel
(bench-throughput-spmc). Slug roots change from mpmc /
mpmc_unbounded to spmc / spmc_unbounded for those adapters;
the Bencher.dev thresholds for the prior MPMC-rooted slugs are
intentionally reset (old history retained as a record, not carried
forward into the new SPMC roots).

Topology-based adapter dispatcher (Option C). Each bench binary
now owns a seq[Adapter] registry; per-adapter
topologiesSupported: set[Topology] declares which topologies that
adapter participates in. The harness iterates and dispatches by
topology rather than the prior name-based case variant of "X":
ladder. CLI takes <topology> as argv[1]
(e.g. ./bench_mpmc spmc 1p2c); bench.yml's Matrix Run step
threads the topology argv through.

HarnessBackoff (cpuPause -> schedYield escalation). Harness-side
schedYield-escalating backoff for unbounded consumer waits, breaking
consumer-livelock on oversubscribed runners (e.g.
mpmc_unbounded 4p4c on 4-vCPU GitHub-hosted Linux). Initial spins
use cpuPause; the spin budget exhausts and the backoff escalates
to schedYield. Tunable via -d:HarnessSpinBudget=N /
-d:HarnessYieldThreshold=N. The queue-side analogue (relaxing
the strict-FIFO consumer-claim path so backoffOnPeerWait itself
escalates) is intentionally deferred to v4.3.0; this release's
src/lockfreequeues/backoff.nim is read-only (impl-plan Constraint 7).

Visualization uplift. Throughput panels switch from the prior
pseudo-continuous numeric x-axis to uPlot bars on a categorical
x-axis. The hero panel renders to a canvas with an offscreen
<table> ARIA companion so screen readers see the same data.
Dark-mode-aware: a MutationObserver on <html data-md-color-scheme>
reflows every chart on theme toggle. Log-scale axes suppress null
and minor-tick labels - only major (power-of-10) ticks render.

Prose layer. Inline ## Glossary (32 entries) and
## Why MPMC is harder than SPSC (cache-line contention; ABA and
reclamation; ordering and asymmetry) sections in
docs/benchmarks.md, surfacing methodology vocabulary above the
chart panels. Bencher CLI threshold-reset note added.

CI tightening. Per-library smoke-step --path: flag propagation
so libraries needing an installed-package path on the bench compile
no longer fail silently. bench-comparison.yml retired (Crossbeam
folded into bench.yml's regular matrix). The Phase 4.6 audit
remediation commit (8e8d78e) wires the orphan
tests/t_topology_split.nim into nimble benchtests and adds a
"Run Python bench tests" step to bench.yml so 36 previously
uncovered Python contract tests run on every push.

Dependency bump. Minimum debra raised from >= 0.7.0 to
>= 0.7.1 to pull in the upstream signal-handler stride fix in
cross-slot reclamation.

Operator-side work required before merge

  1. Manual visual protocol (Decision 4). Run the 7-step
    dark / light / responsive sweep at 320 / 768 / 1280 px against
    the rendered docs site (or a local mike serve of the dev
    alias) to confirm the canvas hero, the per-topology bar panels,
    the latency ladder, and the legend stay legible across themes
    and breakpoints. The docs CI does not exercise the visual
    surface, so this is a human-loop step.
  2. Bencher CLI threshold update. Update the per-measure
    thresholds on baseline-eligible 1p1c / 2p2c bounded slugs to
    the agreed 5% bound. This requires BENCHER_API_TOKEN and
    runs against bencher.dev, not in repo CI.

Post-merge

  • Tag v4.2.0 on devel; release.yml publishes; docs.yml
    deploys the versioned site and the latest mike alias.
  • The post-merge bench.yml run regenerates
    docs/assets/bench-results/latest.json with the restored
    comparison-library slugs (boost / loony / threading_channels /
    crossbeam plus the new tier 1 + 2 entries), at which point the
    LOCKFREEQUEUES_BENCH_STRICT_FLOOR=1 env-var gate on
    benchmarks/tests/test_smoke_compiles.py can flip to default-on
    in v4.3.0 (tracked under Known Limitations).

elijahr added 17 commits May 6, 2026 11:28
Picks up the upstream signal-handler stride fix in cross-slot reclamation
(debra v0.7.1).
… comparison libs

Four comparison libraries (boost.lockfree queue, boost.lockfree
spsc_queue, loony, threading.Chan) were silently missing from
docs/assets/bench-results/latest.json because their adapter smoke
steps in bench.yml failed to compile, and the
`continue-on-error: true` soft-skip pattern flipped each one to
"omit slugs" with only a yellow PR warning. Diagnostic findings
from bench.yml run 25420766836 (devel @ 2026-05-06):

  - boost_lockfree_queue / boost_lockfree_spsc — classification (a):
    `cannot open file: lockfreequeues/internal/aligned_alloc` from
    benchmarks/nim/adapters/boost_lockfree_*_adapter.nim. Root cause:
    the project's nim.cfg sets `--noNimblePath`, so the package's
    own srcDir is not auto-added to the search path for off-tree
    files like the smoke harness. Fix: add `--path:src` to the
    smoke step's `nim cpp` invocation.

  - loony — classification (a) (NOT registry/install failure as
    initially hypothesized): nimble install succeeded; smoke step
    failed with `cannot open file: loony`. Same root cause as
    boost — `--noNimblePath` disables auto-augmentation of the
    search path with installed nimble package roots. Loony also
    imports `pkg/arc` which needs the same treatment. Fix: capture
    `nimble path loony` and `nimble path arc` into env vars in the
    install step; pass both as `--path:` flags to the smoke step.

  - threading.Chan — classification (a): nimble install succeeded;
    smoke step failed with `cannot open file: threading/channels`.
    Same root cause. Fix: capture `nimble path threading` into an
    env var in the install step; pass as `--path:` flag to the
    smoke step.

The fix propagates from smoke into the bench compile step: the
`Build adapter define flags` aggregator now also produces a
`paths` output that the compile step consumes, so any adapter
that passes smoke also gets its path flags during the bench
compile. Without this propagation, an adapter could pass smoke
(one --path: flag in scope) and then silently fail at bench
compile (no --path: flag), reintroducing the silent-mask pattern.

The `continue-on-error: true` soft-skip pattern itself is left in
place — it's the intended fail-soft semantics for missing
optional comparison libraries (registry outage, ABI break, etc.).
This commit removes the silent-mask FALSE NEGATIVES caused by
known-fixable path issues; future genuine outages still
soft-skip with a yellow warning.

Adds benchmarks/tests/test_smoke_compiles.py — a fixture-pinned
guard that asserts the strict-floor library prefixes (boost
queue, boost spsc, loony, threading_channels) are present in
latest.json. Skip-by-default until the next devel bench.yml run
regenerates latest.json with the fixes applied; opt-in to
strict-fail mode via `LOCKFREEQUEUES_BENCH_STRICT_FLOOR=1`. The
slug-presence assertion is the CI-side regression gate against
future silent-mask reintroductions.
Adds 5 new steps to bench.yml mirroring the loony pattern, so the
Crossbeam (Rust) ArrayQueue + SegQueue adapters run inside the
regular bench matrix instead of in the separate
bench-comparison.yml workflow:

  1. Install Rust toolchain (dtolnay/rust-toolchain@stable) +
     Swatinem/rust-cache@v2 with the same workspaces key shape that
     bench-comparison.yml used (so any pre-existing cache hits
     carry over after the consolidation).
  2. Build crossbeam cdylib (cargo build --release --manifest-path
     benchmarks/rust/bench-ffi-crossbeam/Cargo.toml). The manifest
     path is the current pre-Commit-5.2 location; Commit 5.2 will
     rename benchmarks/rust/bench-ffi-crossbeam/ -> .../comparison/
     and update both this manifest-path and the cache workspaces
     key together.
  3. Smoke crossbeam adapters (nim c with both
     -d:adapter_crossbeam_*_available defines and a --passL rpath
     so the smoke binary loads libbench_ffi_crossbeam.so without
     LD_LIBRARY_PATH).
  4. Set crossbeam adapter defines on smoke success — writes
     ADAPTER_CROSSBEAM_ARRAY=1, ADAPTER_CROSSBEAM_SEG=1, and
     CROSSBEAM_RPATH_DIR to GITHUB_ENV. The `Build adapter define
     flags` aggregator picks these up and emits both the
     -d:adapter_crossbeam_*_available defines and a single
     --passL:-Wl,-rpath,... flag for the bench compile step.
  5. Annotate crossbeam skipped on any failure in the toolchain /
     install / smoke chain — yellow PR-check warning identical in
     shape to the loony / boost / threading annotate-skip steps.

Matrix gating: only bench_mpmc (uses crossbeam_array_queue) and
bench_unbounded (uses crossbeam_seg_queue) pay for the Rust
toolchain install. The other matrix legs short-circuit on the
`if:` clauses so the consolidation does not inflate their wall
clock. Adds a corresponding `force_skip_crossbeam`
workflow_dispatch input mirroring the existing
`force_skip_loony` / `force_skip_boost` / etc. inputs.

bench-comparison.yml is RETAINED at this commit and will be
retired in the next commit (`ci(bench): retire bench-comparison.yml
workflow split`); the prose-and-comment cleanup also lives in the
retirement commit so this commit is purely additive.
The previous commit consolidated Crossbeam (Rust) into the bench.yml
matrix, so the dedicated comparison workflow is now redundant.
Delete .github/workflows/bench-comparison.yml outright. Branch
protection on devel does NOT depend on a check from this workflow
(verified via `gh api repos/elijahr/lockfreequeues/branches/devel/
protection` which returns 404 — no required status checks
configured), so retirement is safe with no follow-up settings
change required.

Update prose / comment references that named the retired workflow
by name (4 references; located via grep before editing because
line numbers in the impl plan had drifted slightly):

  - benchmarks/README.md (CI workflows section): merge the
    crossbeam description into bench.yml's bullet, drop the
    dedicated comparison-workflow bullet, and add force_skip_crossbeam
    to the listed force-skip inputs.
  - benchmarks/README.md (chart fixture section): rewrite "only run
    on bench-comparison.yml (nightly cron)" to reflect that all
    comparison adapters now run inside bench.yml.
  - benchmarks/nim/bench_unbounded.nim (BenchSkipOversubscribed
    comment): replace "the full grid still runs in
    bench-comparison.yml nightly cron / on workflow_dispatch" with
    a reference to bench.yml's workflow_dispatch path.
  - benchmarks/nim/smoke/smoke_crossbeam.nim (header comment):
    replace "Used by bench-comparison.yml" with "Used by bench.yml
    (originally Track 3 §3.13 in the retired separate crossbeam
    comparison workflow)".

The internal bench.yml comments that mentioned the retired
workflow by name were also rephrased to drop the literal string,
so a fresh `grep -rn 'bench-comparison' .github/ docs/ benchmarks/`
returns ZERO hits — confirming no broken references remain.

smoke_crossbeam.nim is RETAINED (still invoked by bench.yml's
consolidated `Smoke crossbeam adapters` step from the previous
commit). Deletion of the workflow file is the only structural
change; the smoke harness, the cdylib build, and the Nim adapter
sources all stay where they are. Commit 5.2's Cargo.toml rename is
explicitly out of scope for this work item.
Replace name-based `case variant of "X":` dispatch with a per-binary
seq[Adapter] registry where each adapter declares topologiesSupported
as a set[Topology]. The dispatcher loops over the registry and invokes
each adapter once per requested topology that the adapter supports.

This is a pure mechanical refactor: existing slug emissions are byte-
for-byte identical, all existing tests pass, and the bench harness's
public argv shape changes from `<variant_name> <shape>` to
`<topology> <shape>`. The matrix Run step in bench.yml is updated to
pass the topology argument instead of variant names.

Sipmuc adapters remain on tMpmc / tMpmcUnbounded at this commit;
promotion to tSpmc / tSpmcUnbounded with the corresponding chart-
panel and contract-test updates lands atomically in the next commit
(`feat(bench): add SPMC topology axis; reroute sipmuc to SPMC panel`).

Files touched:
  - benchmarks/nim/bench_common.nim: + Adapter / AdapterRunProc /
    parseTopology
  - benchmarks/nim/bench_spsc.nim, bench_mpsc.nim, bench_mpmc.nim,
    bench_unbounded.nim: lift case-arms into named procs; introduce
    let adapters: seq[Adapter] registry; replace argv parse + dispatch.
  - .github/workflows/bench.yml: matrix Run step invokes binaries by
    topology argument.
Add tSpmc and tSpmcUnbounded values to the Topology enum (mid-enum
insertion: tSpmc after tMpmc, tSpmcUnbounded after tMpscUnbounded;
grep precondition for ordinal-dependent code returned zero hits, so
the insertion is safe). Sipmuc adapters in bench_mpmc and
bench_unbounded reroute from {tMpmc}/{tMpmcUnbounded} to
{tSpmc}/{tSpmcUnbounded}; their slug emissions reroute from
.../mpmc/... and .../mpmc_unbounded/... to .../spmc/... and
.../spmc_unbounded/... respectively.

This is an ATOMIC schema commit — production-side and contract-side
changes land together in a single commit so all coupled surfaces
(THROUGHPUT_PANELS, TOPOLOGY_LABELS, EXPECTED_ROUTING, DOM IDs
tuple, panel-count assertion, merge_bmf SLUG_RE characterization,
benchmarks.md DOM container, superset_check docstring, bench.yml
matrix) are coherent in the same working tree. Splitting the
commit would produce false-positive passes that mask broken
coupling.

Bencher.dev's sipmuc-on-mpmc threshold history will reset because
the slug roots change; this is operator-accepted (MVP item 2) and
will be documented in CHANGELOG in the release-prep commit.
…ro with ARIA companion

Throughput panels now render vertical bars over a categorical X axis
via `uPlot.paths.bars` instead of line series. Each non-blocking
library gets a filled bar (fill = stroke color); blocking libraries
keep the dashed stroke and get a transparent fill so they read as
outlined bars. Per-point dots are disabled — bars carry their own
footprint.

The hero panel is rebuilt on a uPlot canvas using the same primitive.
One series per library, each carrying a single non-null value at its
own x slot, so each bar picks up its own stroke/fill color and
dashed-stroke for blocking libraries. Bars are sorted descending by
mean throughput; the legend is emitted under the canvas with the
same blocking-vs-non-blocking distinction the panel legends use.

An offscreen `<table id="bench-hero-aria-table">` companion is
appended next to the canvas. The canvas carries
`aria-describedby="bench-hero-aria-table"` and an `aria-label`
summary so screen readers can read the per-library throughput values
without relying on the canvas pixel content.

The pre-canvas DOM-`<ol>`-bar renderer is preserved as a named
fallback (`renderHeroDomBars`) and engages whenever
`window.uPlot.paths.bars` is unavailable. The latency panel keeps its
stepped-path rendering — no change there.
Replace the throughput and latency Y-axis `values:` callbacks with a
predicate that returns empty strings for non-finite values and for
minor (non-power-of-10) ticks. uPlot occasionally hands back
null-valued tick entries on log-scale axes when the data range
crosses a decade boundary; the previous formatter rendered those as
"null" or as cluttered minor labels (2e3, 5e3, etc.) between the
major decades.

After this change, log-scale axes show only the major tick labels
(1e0, 1e1, 1e2, ...). Linear-scale axes are unaffected because the
log-rounding check passes through every integer-valued tick.

Applies to `makeThroughputOpts` (Y axis with `distr: logScale ? 3 :
1`) and `makeLatencyOpts` (Y axis hard-coded to log scale).
…e toggle

uPlot canvases now respect Material's light/dark theme. Two helpers,
`themeStroke()` and `themeFont()`, read the live computed text color
and font family from the `.md-content` container (or `body` as a
fallback). Material's theme picker swaps `data-md-color-scheme` on
`<html>`, which cascades into content-area CSS custom properties, so
`getComputedStyle` automatically picks up the current scheme.

Every uPlot axis in `makeThroughputOpts`, `makeLatencyOpts`, and the
hero canvas now merges `themedAxisDefaults()` (axis stroke + font +
neutral mid-gray grid stroke at `rgba(127,127,127,0.15)`) over the
existing axis options via `Object.assign`. The neutral grid stroke
reads in both schemes; axis text follows Material typography.

A module-scoped `panelRebuilds` array tracks each panel's `rebuild`
closure. `installThemeObserver(rebuildAll)` watches `<html>` for
`data-md-color-scheme` and `data-md-color-primary` attribute changes
and invokes every registered rebuild. The observer is idempotent
(guarded by `__themeObserverInstalled`) so re-renders don't multiply
observers, and `panelRebuilds` resets at the top of `render()` so
re-renders don't accumulate stale closures pointing at destroyed
plots.

The hero canvas got a small refactor — its build step is hoisted into
a `buildOpts()` factory so the same options object is rebuilt on
theme toggle, picking up the new `themeStroke()`/`themeFont()` values
each time.

The 7-step manual visual protocol (light/dark/reflow at 320/768/1280
px) is operator-side and not validated here.
…umer waits

Add HarnessBackoff in benchmarks/nim/bench_common.nim — a per-consumer
backoff state machine that spins via cpuPause for the first
HarnessSpinBudget iterations, then escalates to schedYield once the
cumulative spin count crosses HarnessYieldThreshold. Both constants
are intdefine (defaults 128 / 1024); tune at compile time via
`-d:HarnessSpinBudget=N` / `-d:HarnessYieldThreshold=N`.

Update the four consumer-thread call sites in bench_unbounded.nim
to declare a fresh HarnessBackoff at the top of each consume loop
and call `hb.backoff()` instead of `backoffOnPeerWait()`. The
harness wrapper is intentionally NOT named `backoffOnPeerWait` to
avoid shadowing the queue-side helper for v4.3 import discipline.

Drop `-d:BenchSkipOversubscribed` from bench.yml's compile-flag
list. The Nim-side `when not defined(BenchSkipOversubscribed):`
guards in bench_unbounded.nim STAY — re-engaging the gate is a
one-line YAML re-add if a regression demands it.

cpuPause and schedYield are imported directly from
`debra/atomics/backoff` (NOT re-exported by lockfreequeues/backoff;
verified zero hits via `grep -rn 'export cpuPause\|export schedYield'
src/lockfreequeues/`). NO edits to src/lockfreequeues/backoff.nim
(Constraint #7).

This is a stopgap. The canonical fix — schedYield in the queue-side
backoffOnPeerWait + relaxation of strict-FIFO consumer claim — is
deferred to v4.3 to keep this PR's blast radius bounded.

Bencher threshold tightening to 5% on baseline-eligible 1p1c / 2p2c
bounded slugs is a manual server-side step, run pre-merge by the
operator with BENCHER_API_TOKEN exported.
…(Tier 1, header-only)

Vendors three header-only C++ comparison libraries under
`benchmarks/vendor/` and wires them through thin extern "C" wrappers
into the bench harness (same shim pattern as concurrentqueue/MoodyCamel).
All three are MIT-licensed.

Pinned upstream commits:

  atomic_queue (max0x7ba):
    1a3774a89c86ecfdf08753dbd41018ace5a833a4
    -> benchmarks/vendor/atomic_queue/include/atomic_queue/{5 headers}
  rigtorp/SPSCQueue:
    1053918dbd251fbff69b24ef27fa5d51c29ec2af
    -> benchmarks/vendor/rigtorp_spsc/include/rigtorp/SPSCQueue.h
  rigtorp/MPMCQueue:
    b9808ede08f26fa9df4df4e081d19cace8f6c6ea
    -> benchmarks/vendor/rigtorp_mpmc/include/rigtorp/MPMCQueue.h

Each vendor dir ships a per-library README.md with the upgrade
procedure and an extern "C" `*_wrapper.cpp` shim that exposes a
non-template `uint64_t`-payload API (`{lib}_init` / `{lib}_push` /
`{lib}_pop` / `{lib}_destroy`).

Adapter rows added to the topology-based registries:

  bench_spsc:
    atomic_queue   -> {tSpsc} -> "atomic_queue/spsc/1p1c"
    rigtorp_spsc   -> {tSpsc} -> "rigtorp_spsc/spsc/1p1c"
  bench_mpmc:
    atomic_queue   -> {tMpmc} -> "atomic_queue/mpmc/{1,2,4}p{1,2,4}c"
    rigtorp_mpmc   -> {tMpmc} -> "rigtorp_mpmc/mpmc/{1,2,4}p{1,2,4}c"

bench.yml: adds force_skip_{atomic_queue,rigtorp_spsc,rigtorp_mpmc}
workflow_dispatch inputs, three install/smoke/set-define/annotate block
sets gated on the relevant matrix.binary, and three new flag-aggregator
branches that emit `-d:adapter_*_available` and force `mode=cpp` for
the bench compile step (each vendored shim is C++).

`.gitattributes` already covers `benchmarks/vendor/**` — no change
needed.

THIRD_PARTY_LICENSES.md: appends a new
"Tier 1 vendored comparison libraries (v4.2.0 Stage 5.1)" section with
one entry per library (source, pinned commit, license, vendored path,
upgrade-procedure pointer).

Verification (run locally on the worktree):

  nim cpp -d:adapter_atomic_queue_available [...]   # smoke ok
  nim cpp -d:adapter_rigtorp_spsc_available
          -d:adapter_rigtorp_mpmc_available [...]   # smoke ok
  bench_spsc + bench_mpmc compile clean with and without the new gates.
  nimble test                                       # 210/210 OK
  test_bench_charts_contract + test_merge_bmf       # 26/26 OK
  YAML syntax check on bench.yml                    # ok
Renames `benchmarks/rust/bench-ffi-crossbeam/` to
`benchmarks/rust/comparison/` and folds two more Rust queue crates
(`flume`, `kanal`) into the same C-ABI cdylib so the bench binary
links a single shared object regardless of which Rust comparison
adapter is enabled.

Crate manifest:

  package.name           : bench_ffi_crossbeam -> bench_ffi_comparison
  [lib].name             : (default)            -> bench_ffi_comparison
  artifact (Linux)       : libbench_ffi_crossbeam.so
                        -> libbench_ffi_comparison.so

Cargo dependencies added:

  flume = "0.11"     (resolved 0.11.1)
  kanal = "0.1"      (resolved 0.1.1)

Symbol prefix-per-crate (no collisions): each crate's shims live
under a unique prefix so a single cdylib carries all 24 entry points.
Pre-existing `cb_*` exports were renamed to `crossbeam_*` for parity
with the new prefixes; the integration test suite (rlib path) was
updated in lockstep.

Symbol catalog after consolidation (nm -gU on macOS dylib):

  crossbeam_array_init / push / pop / destroy           (4)
  crossbeam_seg_init   / push / pop / destroy           (4)
  flume_init           / push / pop / destroy           (4)
  flume_unbounded_init / push / pop / destroy           (4)
  kanal_init           / push / pop / destroy           (4)
  kanal_unbounded_init / push / pop / destroy           (4)

Total: 24 (>= 8 verification gate satisfied; pre-rename baseline
was 8 cb_* symbols). cargo test --release on the rlib: 6/6 OK.

New Nim adapters:

  flume_adapter.nim   -> bench_mpmc        ({tMpmc})
                         bench_unbounded   ({tMpmcUnbounded})
                         slugs flume/mpmc/{1,2,4}p{1,2,4}c
                         and flume_unbounded/mpmc_unbounded/{1,2}p{1,2}c
  kanal_adapter.nim   -> bench_spsc        ({tSpsc})
                         bench_mpmc        ({tMpmc})
                         bench_unbounded   ({tMpmcUnbounded})
                         slug kanal/spsc/1p1c
                         + kanal/mpmc/{1,2,4}p{1,2,4}c
                         + kanal_unbounded/mpmc_unbounded/{1,2}p{1,2}c

Both adapters reuse `crossbeam_link.nim` for `{.passL.}` emission so
the linker arg set fires exactly once per bench binary. The Nim-side
crossbeam adapters' `importc` symbol names were renamed `cb_*` ->
`crossbeam_*` to match the new Rust exports.

bench.yml: rename the manifest path / cargo cache workspace key from
`bench-ffi-crossbeam` to `comparison`; update library name from
`libbench_ffi_crossbeam` to `libbench_ffi_comparison` in the smoke +
bench-compile invocations; add `force_skip_flume` /
`force_skip_kanal` workflow_dispatch inputs; add a unified
`smoke_comparison.nim` step that exercises flume + kanal under the
shared cdylib (gated on the existing crossbeam install + smoke gates
plus the flume/kanal force-skip flags); update the flag aggregator
with `-d:adapter_flume_available` / `-d:adapter_kanal_available`
branches that share a single `--passL:-Wl,-rpath,...` flag with the
crossbeam adapters; promote the rpath capture to its own step so it
runs whenever the cdylib build succeeded (needed for `bench_spsc` —
kanal is the only Rust adapter there).

THIRD_PARTY_LICENSES.md: refresh the Crossbeam block to point at the
renamed crate path; add new flume (Apache-2.0 OR MIT) and kanal (MIT)
entries under a "Tier 2 Rust comparison libraries" section.

benchmarks/README.md: rename rust/ subdir entry to `comparison/` with
expanded description; refresh the comparison adapter table to add
flume + kanal rows; refresh the Crossbeam quick-start invocations.

Verification (run locally on the worktree):

  cargo build --release --manifest-path \
    benchmarks/rust/comparison/Cargo.toml          # ok
  cargo test  --release --manifest-path \
    benchmarks/rust/comparison/Cargo.toml          # 6/6 OK
  smoke_comparison.nim (flume + kanal)             # all 4 round-trip ok
  smoke_crossbeam.nim (renamed symbols)            # both round-trip ok
  bench_spsc + bench_mpmc + bench_unbounded + bench_mpsc compile
    clean with the new gates set.
  nimble test                                      # 210/210 OK
  test_bench_charts_contract + test_merge_bmf      # 26/26 OK
  YAML syntax check on bench.yml                   # ok

No symbol collisions: prefix-per-crate naming makes the consolidated
cdylib safe to extend; failure path R3 (drop kanal then flume) was
not exercised.
Vendors liblfds 7.1.1 — a portable, license-free, lock-free C data
structure library — under `benchmarks/vendor/liblfds/` and wires a
new bench adapter exposing both bounded SPSC and bounded MPMC slugs:

  - `liblfds/spsc/1p1c`      (lfds711_queue_bss_*)
  - `liblfds/mpmc/{1,2,4}p{1,2,4}c` (lfds711_queue_bmm_*)

This is the v4.2.0 strict-floor 17-project Tier 3 entry. Unlike the
Tier 1 libraries (atomic_queue, rigtorp_*, concurrentqueue) which are
header-only, liblfds ships a C source tree with its own Makefile —
the install step in bench.yml runs `make ar_rel` on the vendored tree
to produce `liblfds711.a` and links the bench binary against it.

License verification protocol
-----------------------------

The upstream source tree ships NO LICENSE file. The canonical license
declaration is published only on the project homepage and was
cross-checked against three independent sources before vendoring:

  1. liblfds.org (canonical homepage; quoted verbatim below).
  2. repology.org/project/liblfds — listed as `custom:none`,
     consistent with public-domain dedication.
  3. github.com/darthcloud/liblfds7.1.1 (mirror used) and
     github.com/topecongiro/liblfds7.1.1 (independent mirror) —
     diff-checked byte-for-byte; identical source content.

All three sources agree. The github.com/liblfds/liblfds7.1.1 mirror
under the upstream's own GitHub org now contains only a one-line
README pointing at liblfds.org — that is why a third-party content
mirror was needed.

Verbatim license declaration from liblfds.org (retrieved 2026-05-06):

  Welcome to liblfds, a portable, license-free, lock-free data
  structure library written in C.

  license

  You are free to use this library in any way. Go forth and create
  wealth!

  If for legal reasons a custom licence is required, the license of
  your choice will be granted, and license is hereby granted up front
  for a range of popular licenses : the MIT license, the BSD license,
  the Apache license, the GPL and LPGL (all versions thereof) and the
  Creative Commons licenses (all of them). Additionally, everything is
  also placed in the public domain.

Conclusion: public-domain dedication + permissive Apache-2.0 grant
satisfies lockfreequeues's own Apache-2.0 redistribution path.

The full quote is preserved both in `THIRD_PARTY_LICENSES.md` and in
`benchmarks/vendor/liblfds/LICENSE` so the audit trail is recoverable
from the vendored tree alone.

Adapter API choice deviation
----------------------------

The original v4.2.0 impl plan called for the `lfds711_ringbuffer_*`
API. That API silently overwrites the oldest element on full rather
than reporting back-pressure, which violates the bench harness's
"messages-produced equals messages-consumed" invariant. Using the
bounded queue APIs (`bss` for SPSC, `bmm` for MPMC) — both of which
return 0 from enqueue on full — preserves the invariant and lets
liblfds participate on the same back-pressure contract as every other
adapter. Slug shapes are unchanged from the impl plan.

Files
-----

  - benchmarks/vendor/liblfds/liblfds711/{inc,src,build}/  (vendored)
  - benchmarks/vendor/liblfds/{LICENSE,README.md,.gitignore}
  - benchmarks/vendor/liblfds/liblfds_wrapper.c (extern-C shim)
  - benchmarks/nim/adapters/liblfds_adapter.nim (new)
  - benchmarks/nim/smoke/smoke_liblfds.nim (new)
  - bench_spsc.nim, bench_mpmc.nim: register liblfds in registry
  - .github/workflows/bench.yml: install (make ar_rel) + smoke +
    set-define + annotate; add force_skip_liblfds workflow_dispatch
    input
  - THIRD_PARTY_LICENSES.md: full audit trail + verbatim license

Verification (gate-pass path)
-----------------------------

  - Vendor build: `make ar_rel` produces
    benchmarks/vendor/liblfds/liblfds711/bin/liblfds711.a
    (Linux native; macOS local-dev needs CFLAGS='-D__linux__ -fPIC'
    DGFLAGS='-D__linux__' override; CI runs Ubuntu so no override).
  - Smoke: `nim c` of smoke_liblfds.nim with the gate define and
    --passL pointing at the produced archive; smoke binary exits 0
    after pushing+popping 32 items through both BSS and BMM backends.
  - Bench compile: `nim c` of bench_spsc.nim and bench_mpmc.nim with
    and without the gate; both shapes compile clean.
  - `nimble test`: 210 tests pass.
  - `python3 -m unittest benchmarks.tests.test_bench_charts_contract
    benchmarks.tests.test_merge_bmf`: 26 tests pass.
  - YAML: `python3 -c 'import yaml; yaml.safe_load(...)'` parses ok.

Linguist-vendored marking is already covered by the existing global
`benchmarks/vendor/** linguist-vendored=true linguist-generated=true`
rule in `.gitattributes`, so no `.gitattributes` edit is required.
… note

Add three prose sections to docs/benchmarks.md:

  - ## Glossary — 32 inline definitions (BMF, P×C, SPSC/MPSC/SPMC/MPMC,
    bounded vs. unbounded, sipmuc/mupsic/sipsic/mupmuc, reclamation,
    backoff, etc.) immediately after "How to read these numbers" so
    readers don't have to context-switch to find shorthand definitions.

  - ## Why MPMC is harder than SPSC — three subsections (cache-line
    contention; ABA and reclamation; ordering and asymmetry) explaining
    why the bench numbers diverge by orders of magnitude across topology
    axes.

  - ### Threshold history (in fairness caveats) — Bencher.dev sipmuc
    threshold reset note: history reset starting v4.2.0 because sipmuc
    moved from MPMC to SPMC; old slug history retained as record but
    not carried into the new SPMC slug roots.

Plus a one-sentence pointer to the Glossary anchor in the existing
reading-order paragraph so readers find the definitions on first scroll.

Anchor sanity verified (no duplicate slugs); contract tests pass; no
other doc files touched.
Bump version 4.1.0 → 4.2.0 in lockfreequeues.nimble. Move the
[Unreleased] CHANGELOG block (PR #29's bench-presentation work plus
this PR's bench-tightening + depth-pass work) to [4.2.0] - 2026-05-06.
Create a new empty [Unreleased] block above.

Highlights of v4.2.0 (full detail in CHANGELOG):
  - 5 new comparison libraries (atomic_queue, rigtorp×2, flume, kanal,
    liblfds) wired through the bench matrix; folly_pcq dropped per
    transitive-include + C++20 audit (strict-floor 16/17, breach
    documented for v4.3.0 follow-up).
  - First-class SPMC topology axis; sipmuc adapters rerouted from
    MPMC to SPMC. Bencher.dev threshold history for sipmuc slugs reset
    accordingly.
  - Topology-based adapter dispatcher (Option C) replaces name-based
    variant dispatch.
  - Harness-side schedYield-escalating backoff unblocks oversubscribed
    unbounded shapes on under-provisioned CI runners. Canonical
    queue-side fix deferred to v4.3.0 (Constraint #7).
  - uPlot bars for throughput panels; dark-mode-aware canvas reflow.
  - Inline glossary + "Why MPMC is harder than SPSC" prose in
    docs/benchmarks.md.
  - bench-comparison.yml retired; Rust cdylib consolidated under
    benchmarks/rust/comparison/.

Release tag (v4.2.0) is a post-merge action handled separately.
The v4.2.0 matrix expansion in bench_unbounded — adding the
flume_unbounded and kanal_unbounded adapter rows plus a fourth
topology invocation (spmc_unbounded) — pushed the per-leg run
past the original 10-minute ceiling on shared GitHub-hosted
runners. One CI dispatch surfaced this concretely (Run bench_unbounded
timed out after 10 minutes), with the post-split BMF
deletion-safety check failing as a downstream consequence
(mupsic/mpsc_unbounded slugs missing because the run was killed
before reaching that topology).

Raise the timeout-minutes from 10 to 20. R13 in the impl plan
explicitly called the workflow timeout the authoritative
completion-detector for the harness backoff fix; this gives the
new wider matrix comfortable headroom without weakening that
gate.

No code changes; YAML only.
Phase 4.6 audit remediation. Three findings, one commit:

1. Green-mirage at suite level — `benchmarks/tests/*.py` was never run
   by any CI workflow despite shipping 36 tests covering BMF schema
   characterization, throughput-panel routing, and STRICT_FLOOR. Add
   a "Run Python bench tests" step to bench.yml that does
   `python3 -m unittest discover -s benchmarks/tests -v` so the
   contract surface is actually exercised on every push.

2. Orphan + bit-rotted Nim test — `tests/t_bench_*` were wired into
   `nimble benchtests` but `tests/t_topology_split.nim` was not, so
   the topology-split contract test silently rotted when v4.2.0
   reclassified sipmuc onto the first-class SPMC topology axis
   (Decision A1). The test still asserted the pre-A1 slug shape
   (`lockfreequeues_sipmuc/mpmc/1pXc` and the `_unbounded` mirror).
   Fix the assertions to match the new emission
   (`.../spmc/1pXc`, `.../spmc_unbounded/1pXc`) and add the test to
   `nimble benchtests` so the same drift cannot happen again.

3. Fact-check finding — `docs/benchmarks.md` glossary defined BMF as
   "Benchmark Manifest Format" while every other in-tree usage and
   the upstream Bencher.dev convention says "Bencher Metric Format".
   One-token correction.

Local validation: `nim c -r tests/t_topology_split.nim` 6/6 green
including the deletion-safety superset_check (post ⊇ pre-split
fixture); `python3 -m unittest discover -s benchmarks/tests -v` 36
tests, 1 skipped (STRICT_FLOOR red-phase guard, expected),
0 failed; `python3 -c "import yaml; yaml.safe_load(...)"` OK.
@axiomantic-momus
Copy link
Copy Markdown

axiomantic-momus Bot commented May 6, 2026

Momus review posted — verdict APPROVE, 1 finding

████████████████████ 100%

run log

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades the project to version 4.2.0, introducing several new comparison libraries and a first-class SPMC topology axis. Key changes include the consolidation of Rust comparison shims into a single cdylib, the implementation of a HarnessBackoff mechanism for better oversubscription handling, and significant enhancements to the benchmarking charts and documentation. Feedback from the review highlights a critical alignment issue in the liblfds wrapper that could cause crashes on x64 systems. Improvements were also suggested for the backoff logic to ensure consistent performance measurements and for the STRICT_FLOOR test to provide comprehensive coverage of all benchmark targets.

Comment on lines +119 to +128
void *bench_liblfds_bmm_init(unsigned long long capacity) {
size_t cap = bench_next_pow2((size_t)capacity);
bench_liblfds_bmm_t *q = (bench_liblfds_bmm_t *)malloc(sizeof(*q));
if (q == NULL) return NULL;
q->elements = (struct lfds711_queue_bmm_element *)malloc(
sizeof(struct lfds711_queue_bmm_element) * cap);
if (q->elements == NULL) {
free(q);
return NULL;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The bench_liblfds_bmm_t structure contains an over-aligned struct lfds711_queue_bmm_state (aligned to LFDS711_PAL_ATOMIC_ISOLATION_IN_BYTES, which is 128 bytes on x64). Standard malloc only guarantees 16-byte alignment on most 64-bit systems. This will cause a crash due to the alignment assertion in lfds711_misc_internal_backoff_init (called during queue initialization). Use an aligned allocation function like posix_memalign instead.

void *bench_liblfds_bmm_init(unsigned long long capacity) {
  size_t cap = bench_next_pow2((size_t)capacity);
  bench_liblfds_bmm_t *q;
  if (posix_memalign((void**)&q, LFDS711_PAL_ATOMIC_ISOLATION_IN_BYTES, sizeof(*q)) != 0)
    return NULL;
  q->elements = (struct lfds711_queue_bmm_element *)malloc(
      sizeof(struct lfds711_queue_bmm_element) * cap);
  if (q->elements == NULL) {
    free(q);
    return NULL;
  }
  q->capacity = cap;
  lfds711_queue_bmm_init_valid_on_current_logical_core(
      &q->state, q->elements, cap, NULL);
  LFDS711_MISC_MAKE_VALID_ON_CURRENT_LOGICAL_CORE_INITS_COMPLETED_BEFORE_NOW_ON_ANY_OTHER_LOGICAL_CORE;
  return q;
}

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a66db21: switched bench_liblfds_bmm_init to posix_memalign with LFDS711_PAL_ATOMIC_ISOLATION_IN_BYTES (128 B on x64) so the embedded lfds711_queue_bmm_state's over-aligned read/write indices are honored. The bss (SPSC) path was left on plain malloc since lfds711_queue_bss_state is not over-aligned.

Comment on lines 117 to +122
while local < ctx.count:
let r = ctx.queue[].pop()
if r.isSome:
inc local
else:
backoffOnPeerWait()
hb.backoff()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The HarnessBackoff state should be reset when a pop() operation succeeds. If the backoff is not reset, it remains "sticky" across successful operations, potentially leading to unnecessary yields and degraded performance measurements once the queue transitions from an empty to a busy state. This should be applied to all consumer loops in this file.

  while local < ctx.count:
    let r = ctx.queue[].pop()
    if r.isSome:
      inc local
      hb = initHarnessBackoff()
    else:
      hb.backoff()

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a66db21: every consumer thread in bench_unbounded.nim (usipsicConsumerThread, usipmucConsumerThread, umupmucConsumerThread, plus the in-line runOneUMupsicRun consumer loop) now resets HarnessBackoff after each successful pop. The comparison-library MVP path uses runThroughputHarness's spin-only consumer in bench_common.nim and is unaffected.

Comment thread benchmarks/nim/bench_common.nim Outdated
Comment on lines +57 to +66
proc backoff*(b: var HarnessBackoff) {.inline.} =
if b.spinsRemaining > 0:
cpuPause()
dec b.spinsRemaining
inc b.spinsConsumed
return
if b.spinsConsumed >= HarnessYieldThreshold:
schedYield()
b.spinsConsumed = 0
b.spinsRemaining = HarnessSpinBudget
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current backoff implementation skips a cpuPause() call whenever the spin budget is reset or a yield is evaluated. This results in an inconsistent pause cadence. The logic should be restructured to ensure a pause occurs on every call that doesn't result in a scheduler yield.

proc backoff*(b: var HarnessBackoff) {.inline.} =
  if b.spinsRemaining <= 0:
    if b.spinsConsumed >= HarnessYieldThreshold:
      schedYield()
      b.spinsConsumed = 0
      b.spinsRemaining = HarnessSpinBudget
      return
    b.spinsRemaining = HarnessSpinBudget
  cpuPause()
  dec b.spinsRemaining
  inc b.spinsConsumed

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a66db21: applied your suggested restructure verbatim. Every backoff() call that does not escalate to schedYield now issues exactly one cpuPause; the previous shape skipped the pause whenever the spin budget was reset, producing an inconsistent pause cadence on the empty-pop path.

Comment on lines +59 to +71
STRICT_FLOOR: frozenset[str] = frozenset(
{
# Boost.LockFree — fixed via --path:src on the smoke step.
"boost_lockfree_queue",
"boost_lockfree_spsc",
# loony — fixed via --path:$(nimble path loony) +
# --path:$(nimble path arc) on smoke and bench compile.
"loony",
# threading.Chan — fixed via
# --path:$(nimble path threading) on smoke and bench compile.
"threading_channels",
}
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The STRICT_FLOOR set is incomplete and only includes the 4 projects that were restored in this PR. According to the PR description and CHANGELOG, the intended floor is 16 projects. This set should be expanded to include all expected comparison libraries and internal adapters to ensure the CI pipeline correctly detects silent omissions of any benchmark target.

STRICT_FLOOR: frozenset[str] = frozenset(
    {
        "lockfreequeues_sipsic", "lockfreequeues_sipmuc", "lockfreequeues_mupsic", "lockfreequeues_mupmuc",
        "lockfreequeues_unbounded_sipsic", "lockfreequeues_unbounded_sipmuc", "lockfreequeues_unbounded_mupsic", "lockfreequeues_unbounded_mupmuc",
        "boost_lockfree_queue", "boost_lockfree_spsc",
        "loony",
        "threading_channels",
        "nim_channel",
        "moodycamel",
        "crossbeam_array_queue", "crossbeam_seg_queue",
        "atomic_queue",
        "rigtorp_spsc", "rigtorp_mpmc",
        "flume",
        "kanal",
        "liblfds",
    }
)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a66db21: STRICT_FLOOR expanded from 4 entries to 23 — the actual emitted slug-prefix closure across bench_{spsc,mpsc,mpmc,unbounded}.nim. Note: the CHANGELOG's "16 libraries" wording counts libraries-as-installed in bench.yml (post-folly_pcq drop); the slug-prefix count is higher because some libraries emit multiple distinct first-segment prefixes (Boost.LockFree shares a single boost_lockfree_queue prefix across the SPSC and MPMC bench files; system.Channel emits nim_channel for MPSC and nim_channels for MPMC; flume and kanal each emit a bounded prefix and a separate unbounded prefix). Test still skips locally per the LOCKFREEQUEUES_BENCH_STRICT_FLOOR env-var gate documented in CHANGELOG until the next post-merge bench.yml run regenerates latest.json.

Copy link
Copy Markdown

@axiomantic-momus axiomantic-momus Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a large v4.2.0 release PR adding 6 new comparison adapters (atomic_queue, rigtorp/SPSCQueue, rigtorp/MPMCQueue, flume, kanal, liblfds), refactoring bench dispatch to a topology-based Adapter registry, introducing an SPMC topology axis, adding HarnessBackoff for oversubscription relief, consolidating the Rust cdylib, retiring bench-comparison.yml, and uplifting chart visualization. One issue found: an undocumented 2×2 grid truncation for flume/kanal unbounded adapters vs the 3×3 grid used by every other unbounded adapter.

Severity tally: 1 Low.

Low

  • BOT-A2 (benchmarks/nim/bench_unbounded.nim:649): Flume and kanal unbounded adapters benchmark only a 2×2 grid while every other unbounded adapter uses a 3×3 grid

Noteworthy

  • The topology-based Adapter registry refactoring is well-structured and eliminates the fragile name-based case-ladder dispatch.
  • The dark-mode MutationObserver and themed axis defaults in bench-charts.js are a thoughtful UX addition for the mkdocs Material theme.
  • The liblfds license audit trail (homepage + Repology + dual-mirror diff check) documented in THIRD_PARTY_LICENSES.md is thorough and defensible.

Verdict: APPROVE.

Commands
  • Comment /ai-review or mention @axiomantic-momus[bot] to request a re-review of the latest changes.
  • Reply to a finding with won't fix, by design, or not a bug to decline it.
  • Reply with instead, ... to propose an alternative fix.

Cost: $0.91 - 2,028,449 in / 24,273 out tokens - deepseek/deepseek-v4-pro

Powered by Momus running deepseek/deepseek-v4-pro via openrouter.ai.

Comment on lines +649 to +667
when declared(initFlumeUnboundedQ):
proc runFlumeUnbounded(em: var BMFEmitter,
topology: Topology) {.nimcall.} =
discard topology
for p in [1, 2]:
for c in [1, 2]:
runMvpUnboundedShape[FlumeUnboundedAdapter[uint64]](
em, "flume_unbounded", initFlumeUnboundedQ,
p, c, UnboundedMupmucRuns, BenchUnboundedWarmup,
UnboundedMupmucMessageCount)

when declared(initKanalUnboundedQ):
proc runKanalUnbounded(em: var BMFEmitter,
topology: Topology) {.nimcall.} =
discard topology
for p in [1, 2]:
for c in [1, 2]:
runMvpUnboundedShape[KanalUnboundedAdapter[uint64]](
em, "kanal_unbounded", initKanalUnboundedQ,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOT-A2 — Low (quality)
Flume and kanal unbounded adapters benchmark only a 2×2 grid while every other unbounded adapter uses a 3×3 grid
runFlumeUnbounded and runKanalUnbounded use for p in [1, 2]: for c in [1, 2] (4 shapes: 1p1c, 1p2c, 2p1c, 2p2c), but every other unbounded MPMC adapter (loony, crossbeam_seg_queue, moodycamel, and lockfreequeues_unbounded_mupmuc under when not defined(BenchSkipOversubscribed)) uses for p in [1, 2, 4]: for c in [1, 2, 4] (9 shapes). The 5 missing shapes (1p4c, 2p4c, 4p1c, 4p2c, 4p4c) are not gated on BenchSkipOversubscribed — they are unconditionally omitted. This grid truncation is not documented in the CHANGELOG Known Limitations, the PR body, or code comments. Quoted from the file: for p in [1, 2]: / for c in [1, 2]: vs loony's for p in [1, 2, 4]: / for c in [1, 2, 4]:.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a66db21: expanded runFlumeUnbounded and runKanalUnbounded from a 2x2 grid to 3x3 ([1, 2, 4] x [1, 2, 4]) so they match every other unbounded MPMC peer (loony, crossbeam_seg_queue, moodycamel, lockfreequeues_unbounded_mupmuc). Investigated the originating commit (40ef120 — flume+kanal wiring); the 2x2 was introduced with no documented rationale, so I expanded to the peer shape rather than carrying it as a known limitation. CHANGELOG ### Changed entry added under [Unreleased].

elijahr added 3 commits May 6, 2026 19:46
The `Track base branch benchmarks with Bencher` step has been failing
on every push to devel since the latency_p99_ns + throughput_ops_ms
two-measure config landed, blocked by:

    Failed to validate the model for the throughput_ops_ms Measure
    Threshold: Invalid threshold model: Invalid model, no boundary
    provided

Bencher CLI's `CliReportThresholds` parser zips the per-measure threshold
flags element-wise via `.next()` over each `Vec<...>` (see
`services/cli/src/bencher/sub/project/report/create/thresholds.rs`).
With one boundary value supplied per measure but two measures, both
boundaries get consumed by the FIRST measure and the second measure
ends up with no boundaries at all.

Fix: use the CLI's documented `_` (`ElidedOption`) convention to align
the boundary arrays with the measure array. For latency we want the
upper bound (regression = latency increase), elided lower; for
throughput we want the lower bound (regression = throughput drop),
elided upper.

Also remove the `continue-on-error: true` band-aid that was masking the
failure and update the long comment block to document the binding fix
and the elision convention so future readers know why the `_` is
there.

Net effect: the t_test 0.99 threshold gates on devel are now hard-
binding, not no-ops. Activation still requires Track 6 Task 6.4's
≥ 10 prior-run soak; until then Bencher dampens alerts on insufficient
sample history.

CHANGELOG: moved from "Changed" (release-day band-aid wording) to a
description that names the diagnosis + fix.
- liblfds bmm wrapper: posix_memalign for 128-byte over-aligned struct.
  `bench_liblfds_bmm_t` embeds `lfds711_queue_bmm_state`, whose
  read/write indices are declared with
  `LFDS711_PAL_ALIGN(LFDS711_PAL_ATOMIC_ISOLATION_IN_BYTES)` (128 B on
  x64); plain `malloc` only guarantees 16-byte alignment, tripping the
  upstream alignment assertion in `lfds711_misc_internal_backoff_init`.
  The bss (SPSC) variant is not over-aligned and does not need the fix.
- bench_common backoff: restructured so every call that does not
  escalate to a scheduler yield issues exactly one `cpuPause`. The
  prior shape skipped the pause whenever the spin budget was reset,
  producing an inconsistent pause cadence on the empty-pop path.
- bench_unbounded consumer threads: reset HarnessBackoff after each
  successful pop in usipsicConsumerThread, usipmucConsumerThread,
  umupmucConsumerThread, and the in-line runOneUMupsicRun consumer
  loop. Sticky `spinsConsumed` from a prior empty-pop streak otherwise
  biased the next contention window into yielding too early. The
  comparison-library MVP path uses runThroughputHarness's spin-only
  consumer in bench_common and is unaffected.
- test_smoke_compiles STRICT_FLOOR: expanded from 4 to 23 slug-prefix
  entries enumerating every prefix the bench harness emits. The
  CHANGELOG's "16 libraries" wording counts libraries-as-installed in
  bench.yml (post-folly_pcq drop); the slug-prefix count is higher
  because some libraries emit multiple distinct first-segment prefixes
  (boost emits one shared prefix, system.Channel emits both
  `nim_channel` and `nim_channels`, flume/kanal each emit a bounded
  and an unbounded prefix). The set is the actual emitted-prefix
  closure across `bench_{spsc,mpsc,mpmc,unbounded}.nim`, so the guard
  catches a drop from any installed adapter, not only the four
  restored in the initial Stage 1 patch.
- bench_unbounded flume/kanal: expanded from a 2x2 grid to 3x3 to
  match every other unbounded MPMC peer (loony, crossbeam_seg_queue,
  moodycamel, lockfreequeues_unbounded_mupmuc). The 2x2 was introduced
  in the flume+kanal wiring commit with no documented rationale.

CHANGELOG: entries for each of the five fixes added under
[Unreleased].
macOS smoke coverage runs only on push events (devel branches and tag
pushes). Pull requests no longer queue a macOS runner — saves CI time
and macOS-minutes for changes whose risk is overwhelmingly Linux-side.
Darwin-only regressions surface on the next push to devel.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant