v4.2.0 release: bench tightening, SPMC topology, comparison expansion#30
v4.2.0 release: bench tightening, SPMC topology, comparison expansion#30
Conversation
Picks up the upstream signal-handler stride fix in cross-slot reclamation (debra v0.7.1).
… comparison libs
Four comparison libraries (boost.lockfree queue, boost.lockfree
spsc_queue, loony, threading.Chan) were silently missing from
docs/assets/bench-results/latest.json because their adapter smoke
steps in bench.yml failed to compile, and the
`continue-on-error: true` soft-skip pattern flipped each one to
"omit slugs" with only a yellow PR warning. Diagnostic findings
from bench.yml run 25420766836 (devel @ 2026-05-06):
- boost_lockfree_queue / boost_lockfree_spsc — classification (a):
`cannot open file: lockfreequeues/internal/aligned_alloc` from
benchmarks/nim/adapters/boost_lockfree_*_adapter.nim. Root cause:
the project's nim.cfg sets `--noNimblePath`, so the package's
own srcDir is not auto-added to the search path for off-tree
files like the smoke harness. Fix: add `--path:src` to the
smoke step's `nim cpp` invocation.
- loony — classification (a) (NOT registry/install failure as
initially hypothesized): nimble install succeeded; smoke step
failed with `cannot open file: loony`. Same root cause as
boost — `--noNimblePath` disables auto-augmentation of the
search path with installed nimble package roots. Loony also
imports `pkg/arc` which needs the same treatment. Fix: capture
`nimble path loony` and `nimble path arc` into env vars in the
install step; pass both as `--path:` flags to the smoke step.
- threading.Chan — classification (a): nimble install succeeded;
smoke step failed with `cannot open file: threading/channels`.
Same root cause. Fix: capture `nimble path threading` into an
env var in the install step; pass as `--path:` flag to the
smoke step.
The fix propagates from smoke into the bench compile step: the
`Build adapter define flags` aggregator now also produces a
`paths` output that the compile step consumes, so any adapter
that passes smoke also gets its path flags during the bench
compile. Without this propagation, an adapter could pass smoke
(one --path: flag in scope) and then silently fail at bench
compile (no --path: flag), reintroducing the silent-mask pattern.
The `continue-on-error: true` soft-skip pattern itself is left in
place — it's the intended fail-soft semantics for missing
optional comparison libraries (registry outage, ABI break, etc.).
This commit removes the silent-mask FALSE NEGATIVES caused by
known-fixable path issues; future genuine outages still
soft-skip with a yellow warning.
Adds benchmarks/tests/test_smoke_compiles.py — a fixture-pinned
guard that asserts the strict-floor library prefixes (boost
queue, boost spsc, loony, threading_channels) are present in
latest.json. Skip-by-default until the next devel bench.yml run
regenerates latest.json with the fixes applied; opt-in to
strict-fail mode via `LOCKFREEQUEUES_BENCH_STRICT_FLOOR=1`. The
slug-presence assertion is the CI-side regression gate against
future silent-mask reintroductions.
Adds 5 new steps to bench.yml mirroring the loony pattern, so the
Crossbeam (Rust) ArrayQueue + SegQueue adapters run inside the
regular bench matrix instead of in the separate
bench-comparison.yml workflow:
1. Install Rust toolchain (dtolnay/rust-toolchain@stable) +
Swatinem/rust-cache@v2 with the same workspaces key shape that
bench-comparison.yml used (so any pre-existing cache hits
carry over after the consolidation).
2. Build crossbeam cdylib (cargo build --release --manifest-path
benchmarks/rust/bench-ffi-crossbeam/Cargo.toml). The manifest
path is the current pre-Commit-5.2 location; Commit 5.2 will
rename benchmarks/rust/bench-ffi-crossbeam/ -> .../comparison/
and update both this manifest-path and the cache workspaces
key together.
3. Smoke crossbeam adapters (nim c with both
-d:adapter_crossbeam_*_available defines and a --passL rpath
so the smoke binary loads libbench_ffi_crossbeam.so without
LD_LIBRARY_PATH).
4. Set crossbeam adapter defines on smoke success — writes
ADAPTER_CROSSBEAM_ARRAY=1, ADAPTER_CROSSBEAM_SEG=1, and
CROSSBEAM_RPATH_DIR to GITHUB_ENV. The `Build adapter define
flags` aggregator picks these up and emits both the
-d:adapter_crossbeam_*_available defines and a single
--passL:-Wl,-rpath,... flag for the bench compile step.
5. Annotate crossbeam skipped on any failure in the toolchain /
install / smoke chain — yellow PR-check warning identical in
shape to the loony / boost / threading annotate-skip steps.
Matrix gating: only bench_mpmc (uses crossbeam_array_queue) and
bench_unbounded (uses crossbeam_seg_queue) pay for the Rust
toolchain install. The other matrix legs short-circuit on the
`if:` clauses so the consolidation does not inflate their wall
clock. Adds a corresponding `force_skip_crossbeam`
workflow_dispatch input mirroring the existing
`force_skip_loony` / `force_skip_boost` / etc. inputs.
bench-comparison.yml is RETAINED at this commit and will be
retired in the next commit (`ci(bench): retire bench-comparison.yml
workflow split`); the prose-and-comment cleanup also lives in the
retirement commit so this commit is purely additive.
The previous commit consolidated Crossbeam (Rust) into the bench.yml
matrix, so the dedicated comparison workflow is now redundant.
Delete .github/workflows/bench-comparison.yml outright. Branch
protection on devel does NOT depend on a check from this workflow
(verified via `gh api repos/elijahr/lockfreequeues/branches/devel/
protection` which returns 404 — no required status checks
configured), so retirement is safe with no follow-up settings
change required.
Update prose / comment references that named the retired workflow
by name (4 references; located via grep before editing because
line numbers in the impl plan had drifted slightly):
- benchmarks/README.md (CI workflows section): merge the
crossbeam description into bench.yml's bullet, drop the
dedicated comparison-workflow bullet, and add force_skip_crossbeam
to the listed force-skip inputs.
- benchmarks/README.md (chart fixture section): rewrite "only run
on bench-comparison.yml (nightly cron)" to reflect that all
comparison adapters now run inside bench.yml.
- benchmarks/nim/bench_unbounded.nim (BenchSkipOversubscribed
comment): replace "the full grid still runs in
bench-comparison.yml nightly cron / on workflow_dispatch" with
a reference to bench.yml's workflow_dispatch path.
- benchmarks/nim/smoke/smoke_crossbeam.nim (header comment):
replace "Used by bench-comparison.yml" with "Used by bench.yml
(originally Track 3 §3.13 in the retired separate crossbeam
comparison workflow)".
The internal bench.yml comments that mentioned the retired
workflow by name were also rephrased to drop the literal string,
so a fresh `grep -rn 'bench-comparison' .github/ docs/ benchmarks/`
returns ZERO hits — confirming no broken references remain.
smoke_crossbeam.nim is RETAINED (still invoked by bench.yml's
consolidated `Smoke crossbeam adapters` step from the previous
commit). Deletion of the workflow file is the only structural
change; the smoke harness, the cdylib build, and the Nim adapter
sources all stay where they are. Commit 5.2's Cargo.toml rename is
explicitly out of scope for this work item.
Replace name-based `case variant of "X":` dispatch with a per-binary
seq[Adapter] registry where each adapter declares topologiesSupported
as a set[Topology]. The dispatcher loops over the registry and invokes
each adapter once per requested topology that the adapter supports.
This is a pure mechanical refactor: existing slug emissions are byte-
for-byte identical, all existing tests pass, and the bench harness's
public argv shape changes from `<variant_name> <shape>` to
`<topology> <shape>`. The matrix Run step in bench.yml is updated to
pass the topology argument instead of variant names.
Sipmuc adapters remain on tMpmc / tMpmcUnbounded at this commit;
promotion to tSpmc / tSpmcUnbounded with the corresponding chart-
panel and contract-test updates lands atomically in the next commit
(`feat(bench): add SPMC topology axis; reroute sipmuc to SPMC panel`).
Files touched:
- benchmarks/nim/bench_common.nim: + Adapter / AdapterRunProc /
parseTopology
- benchmarks/nim/bench_spsc.nim, bench_mpsc.nim, bench_mpmc.nim,
bench_unbounded.nim: lift case-arms into named procs; introduce
let adapters: seq[Adapter] registry; replace argv parse + dispatch.
- .github/workflows/bench.yml: matrix Run step invokes binaries by
topology argument.
Add tSpmc and tSpmcUnbounded values to the Topology enum (mid-enum
insertion: tSpmc after tMpmc, tSpmcUnbounded after tMpscUnbounded;
grep precondition for ordinal-dependent code returned zero hits, so
the insertion is safe). Sipmuc adapters in bench_mpmc and
bench_unbounded reroute from {tMpmc}/{tMpmcUnbounded} to
{tSpmc}/{tSpmcUnbounded}; their slug emissions reroute from
.../mpmc/... and .../mpmc_unbounded/... to .../spmc/... and
.../spmc_unbounded/... respectively.
This is an ATOMIC schema commit — production-side and contract-side
changes land together in a single commit so all coupled surfaces
(THROUGHPUT_PANELS, TOPOLOGY_LABELS, EXPECTED_ROUTING, DOM IDs
tuple, panel-count assertion, merge_bmf SLUG_RE characterization,
benchmarks.md DOM container, superset_check docstring, bench.yml
matrix) are coherent in the same working tree. Splitting the
commit would produce false-positive passes that mask broken
coupling.
Bencher.dev's sipmuc-on-mpmc threshold history will reset because
the slug roots change; this is operator-accepted (MVP item 2) and
will be documented in CHANGELOG in the release-prep commit.
…ro with ARIA companion Throughput panels now render vertical bars over a categorical X axis via `uPlot.paths.bars` instead of line series. Each non-blocking library gets a filled bar (fill = stroke color); blocking libraries keep the dashed stroke and get a transparent fill so they read as outlined bars. Per-point dots are disabled — bars carry their own footprint. The hero panel is rebuilt on a uPlot canvas using the same primitive. One series per library, each carrying a single non-null value at its own x slot, so each bar picks up its own stroke/fill color and dashed-stroke for blocking libraries. Bars are sorted descending by mean throughput; the legend is emitted under the canvas with the same blocking-vs-non-blocking distinction the panel legends use. An offscreen `<table id="bench-hero-aria-table">` companion is appended next to the canvas. The canvas carries `aria-describedby="bench-hero-aria-table"` and an `aria-label` summary so screen readers can read the per-library throughput values without relying on the canvas pixel content. The pre-canvas DOM-`<ol>`-bar renderer is preserved as a named fallback (`renderHeroDomBars`) and engages whenever `window.uPlot.paths.bars` is unavailable. The latency panel keeps its stepped-path rendering — no change there.
Replace the throughput and latency Y-axis `values:` callbacks with a predicate that returns empty strings for non-finite values and for minor (non-power-of-10) ticks. uPlot occasionally hands back null-valued tick entries on log-scale axes when the data range crosses a decade boundary; the previous formatter rendered those as "null" or as cluttered minor labels (2e3, 5e3, etc.) between the major decades. After this change, log-scale axes show only the major tick labels (1e0, 1e1, 1e2, ...). Linear-scale axes are unaffected because the log-rounding check passes through every integer-valued tick. Applies to `makeThroughputOpts` (Y axis with `distr: logScale ? 3 : 1`) and `makeLatencyOpts` (Y axis hard-coded to log scale).
…e toggle uPlot canvases now respect Material's light/dark theme. Two helpers, `themeStroke()` and `themeFont()`, read the live computed text color and font family from the `.md-content` container (or `body` as a fallback). Material's theme picker swaps `data-md-color-scheme` on `<html>`, which cascades into content-area CSS custom properties, so `getComputedStyle` automatically picks up the current scheme. Every uPlot axis in `makeThroughputOpts`, `makeLatencyOpts`, and the hero canvas now merges `themedAxisDefaults()` (axis stroke + font + neutral mid-gray grid stroke at `rgba(127,127,127,0.15)`) over the existing axis options via `Object.assign`. The neutral grid stroke reads in both schemes; axis text follows Material typography. A module-scoped `panelRebuilds` array tracks each panel's `rebuild` closure. `installThemeObserver(rebuildAll)` watches `<html>` for `data-md-color-scheme` and `data-md-color-primary` attribute changes and invokes every registered rebuild. The observer is idempotent (guarded by `__themeObserverInstalled`) so re-renders don't multiply observers, and `panelRebuilds` resets at the top of `render()` so re-renders don't accumulate stale closures pointing at destroyed plots. The hero canvas got a small refactor — its build step is hoisted into a `buildOpts()` factory so the same options object is rebuilt on theme toggle, picking up the new `themeStroke()`/`themeFont()` values each time. The 7-step manual visual protocol (light/dark/reflow at 320/768/1280 px) is operator-side and not validated here.
…umer waits Add HarnessBackoff in benchmarks/nim/bench_common.nim — a per-consumer backoff state machine that spins via cpuPause for the first HarnessSpinBudget iterations, then escalates to schedYield once the cumulative spin count crosses HarnessYieldThreshold. Both constants are intdefine (defaults 128 / 1024); tune at compile time via `-d:HarnessSpinBudget=N` / `-d:HarnessYieldThreshold=N`. Update the four consumer-thread call sites in bench_unbounded.nim to declare a fresh HarnessBackoff at the top of each consume loop and call `hb.backoff()` instead of `backoffOnPeerWait()`. The harness wrapper is intentionally NOT named `backoffOnPeerWait` to avoid shadowing the queue-side helper for v4.3 import discipline. Drop `-d:BenchSkipOversubscribed` from bench.yml's compile-flag list. The Nim-side `when not defined(BenchSkipOversubscribed):` guards in bench_unbounded.nim STAY — re-engaging the gate is a one-line YAML re-add if a regression demands it. cpuPause and schedYield are imported directly from `debra/atomics/backoff` (NOT re-exported by lockfreequeues/backoff; verified zero hits via `grep -rn 'export cpuPause\|export schedYield' src/lockfreequeues/`). NO edits to src/lockfreequeues/backoff.nim (Constraint #7). This is a stopgap. The canonical fix — schedYield in the queue-side backoffOnPeerWait + relaxation of strict-FIFO consumer claim — is deferred to v4.3 to keep this PR's blast radius bounded. Bencher threshold tightening to 5% on baseline-eligible 1p1c / 2p2c bounded slugs is a manual server-side step, run pre-merge by the operator with BENCHER_API_TOKEN exported.
…(Tier 1, header-only)
Vendors three header-only C++ comparison libraries under
`benchmarks/vendor/` and wires them through thin extern "C" wrappers
into the bench harness (same shim pattern as concurrentqueue/MoodyCamel).
All three are MIT-licensed.
Pinned upstream commits:
atomic_queue (max0x7ba):
1a3774a89c86ecfdf08753dbd41018ace5a833a4
-> benchmarks/vendor/atomic_queue/include/atomic_queue/{5 headers}
rigtorp/SPSCQueue:
1053918dbd251fbff69b24ef27fa5d51c29ec2af
-> benchmarks/vendor/rigtorp_spsc/include/rigtorp/SPSCQueue.h
rigtorp/MPMCQueue:
b9808ede08f26fa9df4df4e081d19cace8f6c6ea
-> benchmarks/vendor/rigtorp_mpmc/include/rigtorp/MPMCQueue.h
Each vendor dir ships a per-library README.md with the upgrade
procedure and an extern "C" `*_wrapper.cpp` shim that exposes a
non-template `uint64_t`-payload API (`{lib}_init` / `{lib}_push` /
`{lib}_pop` / `{lib}_destroy`).
Adapter rows added to the topology-based registries:
bench_spsc:
atomic_queue -> {tSpsc} -> "atomic_queue/spsc/1p1c"
rigtorp_spsc -> {tSpsc} -> "rigtorp_spsc/spsc/1p1c"
bench_mpmc:
atomic_queue -> {tMpmc} -> "atomic_queue/mpmc/{1,2,4}p{1,2,4}c"
rigtorp_mpmc -> {tMpmc} -> "rigtorp_mpmc/mpmc/{1,2,4}p{1,2,4}c"
bench.yml: adds force_skip_{atomic_queue,rigtorp_spsc,rigtorp_mpmc}
workflow_dispatch inputs, three install/smoke/set-define/annotate block
sets gated on the relevant matrix.binary, and three new flag-aggregator
branches that emit `-d:adapter_*_available` and force `mode=cpp` for
the bench compile step (each vendored shim is C++).
`.gitattributes` already covers `benchmarks/vendor/**` — no change
needed.
THIRD_PARTY_LICENSES.md: appends a new
"Tier 1 vendored comparison libraries (v4.2.0 Stage 5.1)" section with
one entry per library (source, pinned commit, license, vendored path,
upgrade-procedure pointer).
Verification (run locally on the worktree):
nim cpp -d:adapter_atomic_queue_available [...] # smoke ok
nim cpp -d:adapter_rigtorp_spsc_available
-d:adapter_rigtorp_mpmc_available [...] # smoke ok
bench_spsc + bench_mpmc compile clean with and without the new gates.
nimble test # 210/210 OK
test_bench_charts_contract + test_merge_bmf # 26/26 OK
YAML syntax check on bench.yml # ok
Renames `benchmarks/rust/bench-ffi-crossbeam/` to
`benchmarks/rust/comparison/` and folds two more Rust queue crates
(`flume`, `kanal`) into the same C-ABI cdylib so the bench binary
links a single shared object regardless of which Rust comparison
adapter is enabled.
Crate manifest:
package.name : bench_ffi_crossbeam -> bench_ffi_comparison
[lib].name : (default) -> bench_ffi_comparison
artifact (Linux) : libbench_ffi_crossbeam.so
-> libbench_ffi_comparison.so
Cargo dependencies added:
flume = "0.11" (resolved 0.11.1)
kanal = "0.1" (resolved 0.1.1)
Symbol prefix-per-crate (no collisions): each crate's shims live
under a unique prefix so a single cdylib carries all 24 entry points.
Pre-existing `cb_*` exports were renamed to `crossbeam_*` for parity
with the new prefixes; the integration test suite (rlib path) was
updated in lockstep.
Symbol catalog after consolidation (nm -gU on macOS dylib):
crossbeam_array_init / push / pop / destroy (4)
crossbeam_seg_init / push / pop / destroy (4)
flume_init / push / pop / destroy (4)
flume_unbounded_init / push / pop / destroy (4)
kanal_init / push / pop / destroy (4)
kanal_unbounded_init / push / pop / destroy (4)
Total: 24 (>= 8 verification gate satisfied; pre-rename baseline
was 8 cb_* symbols). cargo test --release on the rlib: 6/6 OK.
New Nim adapters:
flume_adapter.nim -> bench_mpmc ({tMpmc})
bench_unbounded ({tMpmcUnbounded})
slugs flume/mpmc/{1,2,4}p{1,2,4}c
and flume_unbounded/mpmc_unbounded/{1,2}p{1,2}c
kanal_adapter.nim -> bench_spsc ({tSpsc})
bench_mpmc ({tMpmc})
bench_unbounded ({tMpmcUnbounded})
slug kanal/spsc/1p1c
+ kanal/mpmc/{1,2,4}p{1,2,4}c
+ kanal_unbounded/mpmc_unbounded/{1,2}p{1,2}c
Both adapters reuse `crossbeam_link.nim` for `{.passL.}` emission so
the linker arg set fires exactly once per bench binary. The Nim-side
crossbeam adapters' `importc` symbol names were renamed `cb_*` ->
`crossbeam_*` to match the new Rust exports.
bench.yml: rename the manifest path / cargo cache workspace key from
`bench-ffi-crossbeam` to `comparison`; update library name from
`libbench_ffi_crossbeam` to `libbench_ffi_comparison` in the smoke +
bench-compile invocations; add `force_skip_flume` /
`force_skip_kanal` workflow_dispatch inputs; add a unified
`smoke_comparison.nim` step that exercises flume + kanal under the
shared cdylib (gated on the existing crossbeam install + smoke gates
plus the flume/kanal force-skip flags); update the flag aggregator
with `-d:adapter_flume_available` / `-d:adapter_kanal_available`
branches that share a single `--passL:-Wl,-rpath,...` flag with the
crossbeam adapters; promote the rpath capture to its own step so it
runs whenever the cdylib build succeeded (needed for `bench_spsc` —
kanal is the only Rust adapter there).
THIRD_PARTY_LICENSES.md: refresh the Crossbeam block to point at the
renamed crate path; add new flume (Apache-2.0 OR MIT) and kanal (MIT)
entries under a "Tier 2 Rust comparison libraries" section.
benchmarks/README.md: rename rust/ subdir entry to `comparison/` with
expanded description; refresh the comparison adapter table to add
flume + kanal rows; refresh the Crossbeam quick-start invocations.
Verification (run locally on the worktree):
cargo build --release --manifest-path \
benchmarks/rust/comparison/Cargo.toml # ok
cargo test --release --manifest-path \
benchmarks/rust/comparison/Cargo.toml # 6/6 OK
smoke_comparison.nim (flume + kanal) # all 4 round-trip ok
smoke_crossbeam.nim (renamed symbols) # both round-trip ok
bench_spsc + bench_mpmc + bench_unbounded + bench_mpsc compile
clean with the new gates set.
nimble test # 210/210 OK
test_bench_charts_contract + test_merge_bmf # 26/26 OK
YAML syntax check on bench.yml # ok
No symbol collisions: prefix-per-crate naming makes the consolidated
cdylib safe to extend; failure path R3 (drop kanal then flume) was
not exercised.
Vendors liblfds 7.1.1 — a portable, license-free, lock-free C data
structure library — under `benchmarks/vendor/liblfds/` and wires a
new bench adapter exposing both bounded SPSC and bounded MPMC slugs:
- `liblfds/spsc/1p1c` (lfds711_queue_bss_*)
- `liblfds/mpmc/{1,2,4}p{1,2,4}c` (lfds711_queue_bmm_*)
This is the v4.2.0 strict-floor 17-project Tier 3 entry. Unlike the
Tier 1 libraries (atomic_queue, rigtorp_*, concurrentqueue) which are
header-only, liblfds ships a C source tree with its own Makefile —
the install step in bench.yml runs `make ar_rel` on the vendored tree
to produce `liblfds711.a` and links the bench binary against it.
License verification protocol
-----------------------------
The upstream source tree ships NO LICENSE file. The canonical license
declaration is published only on the project homepage and was
cross-checked against three independent sources before vendoring:
1. liblfds.org (canonical homepage; quoted verbatim below).
2. repology.org/project/liblfds — listed as `custom:none`,
consistent with public-domain dedication.
3. github.com/darthcloud/liblfds7.1.1 (mirror used) and
github.com/topecongiro/liblfds7.1.1 (independent mirror) —
diff-checked byte-for-byte; identical source content.
All three sources agree. The github.com/liblfds/liblfds7.1.1 mirror
under the upstream's own GitHub org now contains only a one-line
README pointing at liblfds.org — that is why a third-party content
mirror was needed.
Verbatim license declaration from liblfds.org (retrieved 2026-05-06):
Welcome to liblfds, a portable, license-free, lock-free data
structure library written in C.
license
You are free to use this library in any way. Go forth and create
wealth!
If for legal reasons a custom licence is required, the license of
your choice will be granted, and license is hereby granted up front
for a range of popular licenses : the MIT license, the BSD license,
the Apache license, the GPL and LPGL (all versions thereof) and the
Creative Commons licenses (all of them). Additionally, everything is
also placed in the public domain.
Conclusion: public-domain dedication + permissive Apache-2.0 grant
satisfies lockfreequeues's own Apache-2.0 redistribution path.
The full quote is preserved both in `THIRD_PARTY_LICENSES.md` and in
`benchmarks/vendor/liblfds/LICENSE` so the audit trail is recoverable
from the vendored tree alone.
Adapter API choice deviation
----------------------------
The original v4.2.0 impl plan called for the `lfds711_ringbuffer_*`
API. That API silently overwrites the oldest element on full rather
than reporting back-pressure, which violates the bench harness's
"messages-produced equals messages-consumed" invariant. Using the
bounded queue APIs (`bss` for SPSC, `bmm` for MPMC) — both of which
return 0 from enqueue on full — preserves the invariant and lets
liblfds participate on the same back-pressure contract as every other
adapter. Slug shapes are unchanged from the impl plan.
Files
-----
- benchmarks/vendor/liblfds/liblfds711/{inc,src,build}/ (vendored)
- benchmarks/vendor/liblfds/{LICENSE,README.md,.gitignore}
- benchmarks/vendor/liblfds/liblfds_wrapper.c (extern-C shim)
- benchmarks/nim/adapters/liblfds_adapter.nim (new)
- benchmarks/nim/smoke/smoke_liblfds.nim (new)
- bench_spsc.nim, bench_mpmc.nim: register liblfds in registry
- .github/workflows/bench.yml: install (make ar_rel) + smoke +
set-define + annotate; add force_skip_liblfds workflow_dispatch
input
- THIRD_PARTY_LICENSES.md: full audit trail + verbatim license
Verification (gate-pass path)
-----------------------------
- Vendor build: `make ar_rel` produces
benchmarks/vendor/liblfds/liblfds711/bin/liblfds711.a
(Linux native; macOS local-dev needs CFLAGS='-D__linux__ -fPIC'
DGFLAGS='-D__linux__' override; CI runs Ubuntu so no override).
- Smoke: `nim c` of smoke_liblfds.nim with the gate define and
--passL pointing at the produced archive; smoke binary exits 0
after pushing+popping 32 items through both BSS and BMM backends.
- Bench compile: `nim c` of bench_spsc.nim and bench_mpmc.nim with
and without the gate; both shapes compile clean.
- `nimble test`: 210 tests pass.
- `python3 -m unittest benchmarks.tests.test_bench_charts_contract
benchmarks.tests.test_merge_bmf`: 26 tests pass.
- YAML: `python3 -c 'import yaml; yaml.safe_load(...)'` parses ok.
Linguist-vendored marking is already covered by the existing global
`benchmarks/vendor/** linguist-vendored=true linguist-generated=true`
rule in `.gitattributes`, so no `.gitattributes` edit is required.
… note
Add three prose sections to docs/benchmarks.md:
- ## Glossary — 32 inline definitions (BMF, P×C, SPSC/MPSC/SPMC/MPMC,
bounded vs. unbounded, sipmuc/mupsic/sipsic/mupmuc, reclamation,
backoff, etc.) immediately after "How to read these numbers" so
readers don't have to context-switch to find shorthand definitions.
- ## Why MPMC is harder than SPSC — three subsections (cache-line
contention; ABA and reclamation; ordering and asymmetry) explaining
why the bench numbers diverge by orders of magnitude across topology
axes.
- ### Threshold history (in fairness caveats) — Bencher.dev sipmuc
threshold reset note: history reset starting v4.2.0 because sipmuc
moved from MPMC to SPMC; old slug history retained as record but
not carried into the new SPMC slug roots.
Plus a one-sentence pointer to the Glossary anchor in the existing
reading-order paragraph so readers find the definitions on first scroll.
Anchor sanity verified (no duplicate slugs); contract tests pass; no
other doc files touched.
Bump version 4.1.0 → 4.2.0 in lockfreequeues.nimble. Move the [Unreleased] CHANGELOG block (PR #29's bench-presentation work plus this PR's bench-tightening + depth-pass work) to [4.2.0] - 2026-05-06. Create a new empty [Unreleased] block above. Highlights of v4.2.0 (full detail in CHANGELOG): - 5 new comparison libraries (atomic_queue, rigtorp×2, flume, kanal, liblfds) wired through the bench matrix; folly_pcq dropped per transitive-include + C++20 audit (strict-floor 16/17, breach documented for v4.3.0 follow-up). - First-class SPMC topology axis; sipmuc adapters rerouted from MPMC to SPMC. Bencher.dev threshold history for sipmuc slugs reset accordingly. - Topology-based adapter dispatcher (Option C) replaces name-based variant dispatch. - Harness-side schedYield-escalating backoff unblocks oversubscribed unbounded shapes on under-provisioned CI runners. Canonical queue-side fix deferred to v4.3.0 (Constraint #7). - uPlot bars for throughput panels; dark-mode-aware canvas reflow. - Inline glossary + "Why MPMC is harder than SPSC" prose in docs/benchmarks.md. - bench-comparison.yml retired; Rust cdylib consolidated under benchmarks/rust/comparison/. Release tag (v4.2.0) is a post-merge action handled separately.
The v4.2.0 matrix expansion in bench_unbounded — adding the flume_unbounded and kanal_unbounded adapter rows plus a fourth topology invocation (spmc_unbounded) — pushed the per-leg run past the original 10-minute ceiling on shared GitHub-hosted runners. One CI dispatch surfaced this concretely (Run bench_unbounded timed out after 10 minutes), with the post-split BMF deletion-safety check failing as a downstream consequence (mupsic/mpsc_unbounded slugs missing because the run was killed before reaching that topology). Raise the timeout-minutes from 10 to 20. R13 in the impl plan explicitly called the workflow timeout the authoritative completion-detector for the harness backoff fix; this gives the new wider matrix comfortable headroom without weakening that gate. No code changes; YAML only.
Phase 4.6 audit remediation. Three findings, one commit: 1. Green-mirage at suite level — `benchmarks/tests/*.py` was never run by any CI workflow despite shipping 36 tests covering BMF schema characterization, throughput-panel routing, and STRICT_FLOOR. Add a "Run Python bench tests" step to bench.yml that does `python3 -m unittest discover -s benchmarks/tests -v` so the contract surface is actually exercised on every push. 2. Orphan + bit-rotted Nim test — `tests/t_bench_*` were wired into `nimble benchtests` but `tests/t_topology_split.nim` was not, so the topology-split contract test silently rotted when v4.2.0 reclassified sipmuc onto the first-class SPMC topology axis (Decision A1). The test still asserted the pre-A1 slug shape (`lockfreequeues_sipmuc/mpmc/1pXc` and the `_unbounded` mirror). Fix the assertions to match the new emission (`.../spmc/1pXc`, `.../spmc_unbounded/1pXc`) and add the test to `nimble benchtests` so the same drift cannot happen again. 3. Fact-check finding — `docs/benchmarks.md` glossary defined BMF as "Benchmark Manifest Format" while every other in-tree usage and the upstream Bencher.dev convention says "Bencher Metric Format". One-token correction. Local validation: `nim c -r tests/t_topology_split.nim` 6/6 green including the deletion-safety superset_check (post ⊇ pre-split fixture); `python3 -m unittest discover -s benchmarks/tests -v` 36 tests, 1 skipped (STRICT_FLOOR red-phase guard, expected), 0 failed; `python3 -c "import yaml; yaml.safe_load(...)"` OK.
|
✅ Momus review posted — verdict APPROVE, 1 finding
|
There was a problem hiding this comment.
Code Review
This pull request upgrades the project to version 4.2.0, introducing several new comparison libraries and a first-class SPMC topology axis. Key changes include the consolidation of Rust comparison shims into a single cdylib, the implementation of a HarnessBackoff mechanism for better oversubscription handling, and significant enhancements to the benchmarking charts and documentation. Feedback from the review highlights a critical alignment issue in the liblfds wrapper that could cause crashes on x64 systems. Improvements were also suggested for the backoff logic to ensure consistent performance measurements and for the STRICT_FLOOR test to provide comprehensive coverage of all benchmark targets.
| void *bench_liblfds_bmm_init(unsigned long long capacity) { | ||
| size_t cap = bench_next_pow2((size_t)capacity); | ||
| bench_liblfds_bmm_t *q = (bench_liblfds_bmm_t *)malloc(sizeof(*q)); | ||
| if (q == NULL) return NULL; | ||
| q->elements = (struct lfds711_queue_bmm_element *)malloc( | ||
| sizeof(struct lfds711_queue_bmm_element) * cap); | ||
| if (q->elements == NULL) { | ||
| free(q); | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
The bench_liblfds_bmm_t structure contains an over-aligned struct lfds711_queue_bmm_state (aligned to LFDS711_PAL_ATOMIC_ISOLATION_IN_BYTES, which is 128 bytes on x64). Standard malloc only guarantees 16-byte alignment on most 64-bit systems. This will cause a crash due to the alignment assertion in lfds711_misc_internal_backoff_init (called during queue initialization). Use an aligned allocation function like posix_memalign instead.
void *bench_liblfds_bmm_init(unsigned long long capacity) {
size_t cap = bench_next_pow2((size_t)capacity);
bench_liblfds_bmm_t *q;
if (posix_memalign((void**)&q, LFDS711_PAL_ATOMIC_ISOLATION_IN_BYTES, sizeof(*q)) != 0)
return NULL;
q->elements = (struct lfds711_queue_bmm_element *)malloc(
sizeof(struct lfds711_queue_bmm_element) * cap);
if (q->elements == NULL) {
free(q);
return NULL;
}
q->capacity = cap;
lfds711_queue_bmm_init_valid_on_current_logical_core(
&q->state, q->elements, cap, NULL);
LFDS711_MISC_MAKE_VALID_ON_CURRENT_LOGICAL_CORE_INITS_COMPLETED_BEFORE_NOW_ON_ANY_OTHER_LOGICAL_CORE;
return q;
}There was a problem hiding this comment.
Fixed in a66db21: switched bench_liblfds_bmm_init to posix_memalign with LFDS711_PAL_ATOMIC_ISOLATION_IN_BYTES (128 B on x64) so the embedded lfds711_queue_bmm_state's over-aligned read/write indices are honored. The bss (SPSC) path was left on plain malloc since lfds711_queue_bss_state is not over-aligned.
| while local < ctx.count: | ||
| let r = ctx.queue[].pop() | ||
| if r.isSome: | ||
| inc local | ||
| else: | ||
| backoffOnPeerWait() | ||
| hb.backoff() |
There was a problem hiding this comment.
The HarnessBackoff state should be reset when a pop() operation succeeds. If the backoff is not reset, it remains "sticky" across successful operations, potentially leading to unnecessary yields and degraded performance measurements once the queue transitions from an empty to a busy state. This should be applied to all consumer loops in this file.
while local < ctx.count:
let r = ctx.queue[].pop()
if r.isSome:
inc local
hb = initHarnessBackoff()
else:
hb.backoff()
There was a problem hiding this comment.
Fixed in a66db21: every consumer thread in bench_unbounded.nim (usipsicConsumerThread, usipmucConsumerThread, umupmucConsumerThread, plus the in-line runOneUMupsicRun consumer loop) now resets HarnessBackoff after each successful pop. The comparison-library MVP path uses runThroughputHarness's spin-only consumer in bench_common.nim and is unaffected.
| proc backoff*(b: var HarnessBackoff) {.inline.} = | ||
| if b.spinsRemaining > 0: | ||
| cpuPause() | ||
| dec b.spinsRemaining | ||
| inc b.spinsConsumed | ||
| return | ||
| if b.spinsConsumed >= HarnessYieldThreshold: | ||
| schedYield() | ||
| b.spinsConsumed = 0 | ||
| b.spinsRemaining = HarnessSpinBudget |
There was a problem hiding this comment.
The current backoff implementation skips a cpuPause() call whenever the spin budget is reset or a yield is evaluated. This results in an inconsistent pause cadence. The logic should be restructured to ensure a pause occurs on every call that doesn't result in a scheduler yield.
proc backoff*(b: var HarnessBackoff) {.inline.} =
if b.spinsRemaining <= 0:
if b.spinsConsumed >= HarnessYieldThreshold:
schedYield()
b.spinsConsumed = 0
b.spinsRemaining = HarnessSpinBudget
return
b.spinsRemaining = HarnessSpinBudget
cpuPause()
dec b.spinsRemaining
inc b.spinsConsumed
There was a problem hiding this comment.
Fixed in a66db21: applied your suggested restructure verbatim. Every backoff() call that does not escalate to schedYield now issues exactly one cpuPause; the previous shape skipped the pause whenever the spin budget was reset, producing an inconsistent pause cadence on the empty-pop path.
| STRICT_FLOOR: frozenset[str] = frozenset( | ||
| { | ||
| # Boost.LockFree — fixed via --path:src on the smoke step. | ||
| "boost_lockfree_queue", | ||
| "boost_lockfree_spsc", | ||
| # loony — fixed via --path:$(nimble path loony) + | ||
| # --path:$(nimble path arc) on smoke and bench compile. | ||
| "loony", | ||
| # threading.Chan — fixed via | ||
| # --path:$(nimble path threading) on smoke and bench compile. | ||
| "threading_channels", | ||
| } | ||
| ) |
There was a problem hiding this comment.
The STRICT_FLOOR set is incomplete and only includes the 4 projects that were restored in this PR. According to the PR description and CHANGELOG, the intended floor is 16 projects. This set should be expanded to include all expected comparison libraries and internal adapters to ensure the CI pipeline correctly detects silent omissions of any benchmark target.
STRICT_FLOOR: frozenset[str] = frozenset(
{
"lockfreequeues_sipsic", "lockfreequeues_sipmuc", "lockfreequeues_mupsic", "lockfreequeues_mupmuc",
"lockfreequeues_unbounded_sipsic", "lockfreequeues_unbounded_sipmuc", "lockfreequeues_unbounded_mupsic", "lockfreequeues_unbounded_mupmuc",
"boost_lockfree_queue", "boost_lockfree_spsc",
"loony",
"threading_channels",
"nim_channel",
"moodycamel",
"crossbeam_array_queue", "crossbeam_seg_queue",
"atomic_queue",
"rigtorp_spsc", "rigtorp_mpmc",
"flume",
"kanal",
"liblfds",
}
)There was a problem hiding this comment.
Fixed in a66db21: STRICT_FLOOR expanded from 4 entries to 23 — the actual emitted slug-prefix closure across bench_{spsc,mpsc,mpmc,unbounded}.nim. Note: the CHANGELOG's "16 libraries" wording counts libraries-as-installed in bench.yml (post-folly_pcq drop); the slug-prefix count is higher because some libraries emit multiple distinct first-segment prefixes (Boost.LockFree shares a single boost_lockfree_queue prefix across the SPSC and MPMC bench files; system.Channel emits nim_channel for MPSC and nim_channels for MPMC; flume and kanal each emit a bounded prefix and a separate unbounded prefix). Test still skips locally per the LOCKFREEQUEUES_BENCH_STRICT_FLOOR env-var gate documented in CHANGELOG until the next post-merge bench.yml run regenerates latest.json.
There was a problem hiding this comment.
This is a large v4.2.0 release PR adding 6 new comparison adapters (atomic_queue, rigtorp/SPSCQueue, rigtorp/MPMCQueue, flume, kanal, liblfds), refactoring bench dispatch to a topology-based Adapter registry, introducing an SPMC topology axis, adding HarnessBackoff for oversubscription relief, consolidating the Rust cdylib, retiring bench-comparison.yml, and uplifting chart visualization. One issue found: an undocumented 2×2 grid truncation for flume/kanal unbounded adapters vs the 3×3 grid used by every other unbounded adapter.
Severity tally: 1 Low.
Low
- BOT-A2 (
benchmarks/nim/bench_unbounded.nim:649): Flume and kanal unbounded adapters benchmark only a 2×2 grid while every other unbounded adapter uses a 3×3 grid
Noteworthy
- The topology-based Adapter registry refactoring is well-structured and eliminates the fragile name-based case-ladder dispatch.
- The dark-mode MutationObserver and themed axis defaults in bench-charts.js are a thoughtful UX addition for the mkdocs Material theme.
- The liblfds license audit trail (homepage + Repology + dual-mirror diff check) documented in THIRD_PARTY_LICENSES.md is thorough and defensible.
Verdict: APPROVE.
Commands
- Comment
/ai-reviewor mention @axiomantic-momus[bot] to request a re-review of the latest changes. - Reply to a finding with
won't fix,by design, ornot a bugto decline it. - Reply with
instead, ...to propose an alternative fix.
Cost: $0.91 - 2,028,449 in / 24,273 out tokens - deepseek/deepseek-v4-pro
Powered by Momus running deepseek/deepseek-v4-pro via openrouter.ai.
| when declared(initFlumeUnboundedQ): | ||
| proc runFlumeUnbounded(em: var BMFEmitter, | ||
| topology: Topology) {.nimcall.} = | ||
| discard topology | ||
| for p in [1, 2]: | ||
| for c in [1, 2]: | ||
| runMvpUnboundedShape[FlumeUnboundedAdapter[uint64]]( | ||
| em, "flume_unbounded", initFlumeUnboundedQ, | ||
| p, c, UnboundedMupmucRuns, BenchUnboundedWarmup, | ||
| UnboundedMupmucMessageCount) | ||
|
|
||
| when declared(initKanalUnboundedQ): | ||
| proc runKanalUnbounded(em: var BMFEmitter, | ||
| topology: Topology) {.nimcall.} = | ||
| discard topology | ||
| for p in [1, 2]: | ||
| for c in [1, 2]: | ||
| runMvpUnboundedShape[KanalUnboundedAdapter[uint64]]( | ||
| em, "kanal_unbounded", initKanalUnboundedQ, |
There was a problem hiding this comment.
BOT-A2 — Low (quality)
Flume and kanal unbounded adapters benchmark only a 2×2 grid while every other unbounded adapter uses a 3×3 grid
runFlumeUnbounded and runKanalUnbounded use for p in [1, 2]: for c in [1, 2] (4 shapes: 1p1c, 1p2c, 2p1c, 2p2c), but every other unbounded MPMC adapter (loony, crossbeam_seg_queue, moodycamel, and lockfreequeues_unbounded_mupmuc under when not defined(BenchSkipOversubscribed)) uses for p in [1, 2, 4]: for c in [1, 2, 4] (9 shapes). The 5 missing shapes (1p4c, 2p4c, 4p1c, 4p2c, 4p4c) are not gated on BenchSkipOversubscribed — they are unconditionally omitted. This grid truncation is not documented in the CHANGELOG Known Limitations, the PR body, or code comments. Quoted from the file: for p in [1, 2]: / for c in [1, 2]: vs loony's for p in [1, 2, 4]: / for c in [1, 2, 4]:.
There was a problem hiding this comment.
Fixed in a66db21: expanded runFlumeUnbounded and runKanalUnbounded from a 2x2 grid to 3x3 ([1, 2, 4] x [1, 2, 4]) so they match every other unbounded MPMC peer (loony, crossbeam_seg_queue, moodycamel, lockfreequeues_unbounded_mupmuc). Investigated the originating commit (40ef120 — flume+kanal wiring); the 2x2 was introduced with no documented rationale, so I expanded to the peer shape rather than carrying it as a known limitation. CHANGELOG ### Changed entry added under [Unreleased].
The `Track base branch benchmarks with Bencher` step has been failing
on every push to devel since the latency_p99_ns + throughput_ops_ms
two-measure config landed, blocked by:
Failed to validate the model for the throughput_ops_ms Measure
Threshold: Invalid threshold model: Invalid model, no boundary
provided
Bencher CLI's `CliReportThresholds` parser zips the per-measure threshold
flags element-wise via `.next()` over each `Vec<...>` (see
`services/cli/src/bencher/sub/project/report/create/thresholds.rs`).
With one boundary value supplied per measure but two measures, both
boundaries get consumed by the FIRST measure and the second measure
ends up with no boundaries at all.
Fix: use the CLI's documented `_` (`ElidedOption`) convention to align
the boundary arrays with the measure array. For latency we want the
upper bound (regression = latency increase), elided lower; for
throughput we want the lower bound (regression = throughput drop),
elided upper.
Also remove the `continue-on-error: true` band-aid that was masking the
failure and update the long comment block to document the binding fix
and the elision convention so future readers know why the `_` is
there.
Net effect: the t_test 0.99 threshold gates on devel are now hard-
binding, not no-ops. Activation still requires Track 6 Task 6.4's
≥ 10 prior-run soak; until then Bencher dampens alerts on insufficient
sample history.
CHANGELOG: moved from "Changed" (release-day band-aid wording) to a
description that names the diagnosis + fix.
- liblfds bmm wrapper: posix_memalign for 128-byte over-aligned struct.
`bench_liblfds_bmm_t` embeds `lfds711_queue_bmm_state`, whose
read/write indices are declared with
`LFDS711_PAL_ALIGN(LFDS711_PAL_ATOMIC_ISOLATION_IN_BYTES)` (128 B on
x64); plain `malloc` only guarantees 16-byte alignment, tripping the
upstream alignment assertion in `lfds711_misc_internal_backoff_init`.
The bss (SPSC) variant is not over-aligned and does not need the fix.
- bench_common backoff: restructured so every call that does not
escalate to a scheduler yield issues exactly one `cpuPause`. The
prior shape skipped the pause whenever the spin budget was reset,
producing an inconsistent pause cadence on the empty-pop path.
- bench_unbounded consumer threads: reset HarnessBackoff after each
successful pop in usipsicConsumerThread, usipmucConsumerThread,
umupmucConsumerThread, and the in-line runOneUMupsicRun consumer
loop. Sticky `spinsConsumed` from a prior empty-pop streak otherwise
biased the next contention window into yielding too early. The
comparison-library MVP path uses runThroughputHarness's spin-only
consumer in bench_common and is unaffected.
- test_smoke_compiles STRICT_FLOOR: expanded from 4 to 23 slug-prefix
entries enumerating every prefix the bench harness emits. The
CHANGELOG's "16 libraries" wording counts libraries-as-installed in
bench.yml (post-folly_pcq drop); the slug-prefix count is higher
because some libraries emit multiple distinct first-segment prefixes
(boost emits one shared prefix, system.Channel emits both
`nim_channel` and `nim_channels`, flume/kanal each emit a bounded
and an unbounded prefix). The set is the actual emitted-prefix
closure across `bench_{spsc,mpsc,mpmc,unbounded}.nim`, so the guard
catches a drop from any installed adapter, not only the four
restored in the initial Stage 1 patch.
- bench_unbounded flume/kanal: expanded from a 2x2 grid to 3x3 to
match every other unbounded MPMC peer (loony, crossbeam_seg_queue,
moodycamel, lockfreequeues_unbounded_mupmuc). The 2x2 was introduced
in the flume+kanal wiring commit with no documented rationale.
CHANGELOG: entries for each of the five fixes added under
[Unreleased].
macOS smoke coverage runs only on push events (devel branches and tag pushes). Pull requests no longer queue a macOS runner — saves CI time and macOS-minutes for changes whose risk is overwhelmingly Linux-side. Darwin-only regressions surface on the next push to devel.
Closes out the v4.2.0 release. 17 commits, 144 files changed
(+13.7k / -948). Full per-area detail lives in
CHANGELOG.mdunder
[4.2.0] - 2026-05-06; this description covers the releaseshape and the operator-side work that has to land before merge.
Highlights
Comparison set expansion (Tier 1+2+3). Five new third-party
adapters reach the bench matrix:
atomic_queue(header-only, MIT),rigtorp/SPSCQueue(header-only, MIT),rigtorp/MPMCQueue(header-only, MIT),
flume(Rust, MPL-2.0),kanal(Rust,MPL-2.0), plus
liblfds(C ringbuffer, public-domain) wired througha thin C wrapper. The Rust crates are consolidated into one cdylib
at
benchmarks/rust/comparison/(libbench_ffi_comparison)alongside the existing Crossbeam adapters; prefix-per-crate symbol
naming (
cb_*/flume_*/kanal_*) prevents collision acrossthe consolidated FFI surface. C++ headers are vendored under
benchmarks/vendor/with pinned upstream SHAs and project-authoredupgrade READMEs.
Strict-floor breach (16/17, documented).
folly_pcqwas DROPPEDduring Work Item G after audit: the transitive-include closure is
15 unique folly headers (>6 threshold) and folly main requires
C++20 vs the repo's C++17. Final floor is 16/17. Documented in
CHANGELOG
Known Limitationswith revisit conditions for v4.3.0+.First-class SPMC topology axis. Sipmuc (bounded
lockfreequeues_sipmucand unboundedlockfreequeues_unbounded_sipmuc)moves off the MPMC panel onto a new dedicated SPMC panel
(
bench-throughput-spmc). Slug roots change frommpmc/mpmc_unboundedtospmc/spmc_unboundedfor those adapters;the Bencher.dev thresholds for the prior MPMC-rooted slugs are
intentionally reset (old history retained as a record, not carried
forward into the new SPMC roots).
Topology-based adapter dispatcher (Option C). Each bench binary
now owns a
seq[Adapter]registry; per-adaptertopologiesSupported: set[Topology]declares which topologies thatadapter participates in. The harness iterates and dispatches by
topology rather than the prior name-based
case variant of "X":ladder. CLI takes
<topology>asargv[1](e.g.
./bench_mpmc spmc 1p2c);bench.yml's Matrix Run stepthreads the topology argv through.
HarnessBackoff (cpuPause -> schedYield escalation). Harness-side
schedYield-escalating backoff for unbounded consumer waits, breaking
consumer-livelock on oversubscribed runners (e.g.
mpmc_unbounded 4p4con 4-vCPU GitHub-hosted Linux). Initial spinsuse
cpuPause; the spin budget exhausts and the backoff escalatesto
schedYield. Tunable via-d:HarnessSpinBudget=N/-d:HarnessYieldThreshold=N. The queue-side analogue (relaxingthe strict-FIFO consumer-claim path so
backoffOnPeerWaititselfescalates) is intentionally deferred to v4.3.0; this release's
src/lockfreequeues/backoff.nimis read-only (impl-plan Constraint 7).Visualization uplift. Throughput panels switch from the prior
pseudo-continuous numeric x-axis to uPlot bars on a categorical
x-axis. The hero panel renders to a canvas with an offscreen
<table>ARIA companion so screen readers see the same data.Dark-mode-aware: a
MutationObserveron<html data-md-color-scheme>reflows every chart on theme toggle. Log-scale axes suppress null
and minor-tick labels - only major (power-of-10) ticks render.
Prose layer. Inline
## Glossary(32 entries) and## Why MPMC is harder than SPSC(cache-line contention; ABA andreclamation; ordering and asymmetry) sections in
docs/benchmarks.md, surfacing methodology vocabulary above thechart panels. Bencher CLI threshold-reset note added.
CI tightening. Per-library smoke-step
--path:flag propagationso libraries needing an installed-package path on the bench compile
no longer fail silently.
bench-comparison.ymlretired (Crossbeamfolded into
bench.yml's regular matrix). The Phase 4.6 auditremediation commit (
8e8d78e) wires the orphantests/t_topology_split.nimintonimble benchtestsand adds a"Run Python bench tests" step to
bench.ymlso 36 previouslyuncovered Python contract tests run on every push.
Dependency bump. Minimum
debraraised from>= 0.7.0to>= 0.7.1to pull in the upstream signal-handler stride fix incross-slot reclamation.
Operator-side work required before merge
dark / light / responsive sweep at 320 / 768 / 1280 px against
the rendered docs site (or a local
mike serveof the devalias) to confirm the canvas hero, the per-topology bar panels,
the latency ladder, and the legend stay legible across themes
and breakpoints. The docs CI does not exercise the visual
surface, so this is a human-loop step.
thresholds on baseline-eligible 1p1c / 2p2c bounded slugs to
the agreed 5% bound. This requires
BENCHER_API_TOKENandruns against
bencher.dev, not in repo CI.Post-merge
v4.2.0ondevel;release.ymlpublishes;docs.ymldeploys the versioned site and the
latestmike alias.bench.ymlrun regeneratesdocs/assets/bench-results/latest.jsonwith the restoredcomparison-library slugs (boost / loony / threading_channels /
crossbeam plus the new tier 1 + 2 entries), at which point the
LOCKFREEQUEUES_BENCH_STRICT_FLOOR=1env-var gate onbenchmarks/tests/test_smoke_compiles.pycan flip to default-onin v4.3.0 (tracked under
Known Limitations).