Comparison MVP: Loony, Boost.LockFree, Crossbeam adapters + cache-line padding fix by elijahr · Pull Request #22 · elijahr/lockfreequeues

elijahr · 2026-05-01T16:08:22Z

Depends on PR #19, PR #20, PR #21.

Summary

Track 3 (Comparison MVP) of the bench-rollup feature. Adds five third-party adapters so each topology has at least three distinct libraries plotted on the same Bencher dashboard, plus the cache-line padding fix flagged by design §4.2.

What's in this PR

Comparison MVP adapters (Track 3 §3.5–3.10)

Library	Variant	Topology	Compile gate
Loony	`LoonyQueue`	`mpmc_unbounded`	`-d:adapter_loony_available`
Boost.LockFree	`boost::lockfree::queue`	`mpmc` (bounded)	`-d:adapter_boost_lockfree_queue_available` (requires `nim cpp`)
Boost.LockFree	`boost::lockfree::spsc_queue`	`spsc` (bounded)	`-d:adapter_boost_lockfree_spsc_available` (requires `nim cpp`)
Crossbeam	`crossbeam_queue::ArrayQueue`	`mpmc` (bounded)	`-d:adapter_crossbeam_array_queue_available`
Crossbeam	`crossbeam_queue::SegQueue`	`mpmc_unbounded`	`-d:adapter_crossbeam_seg_queue_available`

Each adapter is wrapped in when defined(adapter_<lib>_available): so absent gates produce zero symbol references and production builds are unchanged. Round-trip (1000-item set equality) tests are wired into tests/t_bench_adapters.nim, gated by the same defines.

New Rust crate `benchmarks/rust/bench-ffi-crossbeam/`

A cdylib exposing 8 extern "C" fns wrapping crossbeam_queue::ArrayQueue<u64> and crossbeam_queue::SegQueue<u64>. rust-toolchain.toml pins stable. Six integration tests cover round-trip set equality for both queues, capacity edges, empty-pop, and null-pointer tolerance.

CI integration

bench.yml now installs Loony + Boost.LockFree on every PR via the design §2.6 soft-skip pattern (install -> smoke -> set-flag-on-success -> annotate-on-failure). workflow_dispatch accepts force_skip_boost / force_skip_loony boolean inputs to exercise the skip path manually.
New bench-comparison.yml runs Crossbeam on a nightly cron (0 4 * * *), workflow_dispatch, and targeted path pushes to devel. It owns its own Bencher Report. Crossbeam runs in this nightly workflow only — not in bench.yml — to keep PR critical-path time unchanged.

Cache-line padding audit (Track 3 §3.1–3.3)

The four unbounded queue variants previously allocated segments via c_calloc (16-byte ABI alignment). Now allocated via posix_memalign through the new src/lockfreequeues/internal/aligned_alloc.nim helper, with {.align: CacheLineBytes.} on per-segment Atomic fields. Verified by tests/t_unbounded_padding.nim (8 assertions across 4 variants, green under c/cpp/arc/refc).

Documentation

New THIRD_PARTY_LICENSES.md records license obligations for the comparison MVP libraries (Loony MIT, Boost BSL-1.0, Crossbeam Apache-2.0 OR MIT) with placeholder slots for concurrentqueue (PR 4) and uPlot (PR 5).
benchmarks/README.md gains a "Comparison MVP" subsection with library/topology/install table and per-adapter local-run snippets.
CHANGELOG.md [Unreleased] updated with Added (5 adapters, Rust crate, smoke binaries, bench-comparison.yml, soft-skip flow, THIRD_PARTY_LICENSES, aligned_alloc) and Fixed (cache-line padding for unbounded segments).

Verification

nimble test -> 200 OK / 0 FAILED / 0 SKIPPED.
cargo test --release on the Rust crate -> 6 passed.
actionlint clean on both bench.yml and bench-comparison.yml.
t_bench_adapters.nim exercises Crossbeam ArrayQueue + SegQueue locally (Cargo + cdylib available); Loony and Boost are gated and skip vacuously without their respective installs.

gemini-code-assist

Code Review

This pull request integrates third-party comparison adapters for Loony, Boost, and Crossbeam queues into the benchmark suite, including a Rust-based FFI shim for Crossbeam. It also implements cache-line alignment for unbounded queue segments using a new allocAligned utility and field alignment pragmas to mitigate false sharing. Feedback points out a race condition in the Loony adapter's pop logic and potential overflow issues in its item encoding scheme.

gemini-code-assist · 2026-05-01T16:09:55Z

+    if a.queue.isEmpty():
+      PopResult[T](success: false)
+    else:
+      let raw = a.queue.pop()
+      PopResult[T](success: true, value: T((raw shr LoonyTagShift) - 1'u64))


The current pop implementation contains a race condition. In an MPMC context, isEmpty() followed by pop() is not atomic. If another consumer pops the last item between these two calls, a.queue.pop() will return the default value (0), which the adapter then incorrectly processes as a valid value (resulting in an underflow to high(uint64)). Since your encoding ensures that valid items are never 0, you should check the result of pop() directly and remove the isEmpty() check.

let raw = a.queue.pop() if raw == 0: PopResult[T](success: false) else: PopResult[T](success: true, value: T((raw shr LoonyTagShift) - 1'u64))

gemini-code-assist · 2026-05-01T16:09:55Z

+    discard a
+
+  proc push*[T](a: var LoonyAdapter[T], item: T): PushResult =
+    a.queue.push((uint64(item) + 1'u64) shl LoonyTagShift)


The encoding logic (uint64(item) + 1'u64) shl LoonyTagShift will lose information for values greater than or equal to 2^61 due to the 3-bit left shift. Additionally, if item == high(uint64), the result becomes 0, which is the 'empty' sentinel, causing the push to be effectively ignored. While this is likely safe for the current benchmark range, it should be documented as a limitation of the adapter.

merge_bmf.py rejects slugs that aren't <library>/<topology>/p<C>c (exactly 3 segments). PR #23's CI failed on the moodycamel slug: invalid slug shape 'moodycamel/ConcurrentQueue/mpmc_unbounded/1p1c' The same 4-segment violations exist for loony and crossbeam_seg_queue in bench_unbounded.nim, and the crossbeam_array_queue dispatch was emitting a colliding "crossbeam_queue" lib name. Apply the canonical lib slugs from the design doc so all dispatch sites yield 3 segments after the runner appends "/<topology>/<shape>": loony/LoonyQueue -> loony crossbeam_queue/SegQueue -> crossbeam_seg_queue crossbeam_queue (Array) -> crossbeam_array_queue moodycamel/ConcurrentQueue -> moodycamel The PR #22 worktree applies the loony and crossbeam fixes too; git rebase will harmonize the duplicate edits. The adapter `name` constants (e.g. "moodycamel/ConcurrentQueue[uint64]") are human-readable echo strings, not BMF slugs, and stay as-is.

github-actions · 2026-05-01T18:18:00Z

Bencher Report

Branch	feat/bench-rollup-pr3-comparison-mvp
Testbed	ubuntu-latest

⚠️ WARNING: No Threshold found!
Without a Threshold, no Alerts will ever be generated.
latency_p50_ns (Measure (units))
latency_p95_ns (Measure (units))
latency_p99_ns (Measure (units))
throughput_ops_ms (Measure (units))
Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds flag.

Click to view all benchmark results

Benchmark	latency_p50_ns	Measure (units)	latency_p95_ns	Measure (units)	latency_p99_ns	Measure (units)	throughput_ops_ms	Measure (units)
lockfreequeues_mupmuc/mpmc/1p1c	📈 view plot ⚠️ NO THRESHOLD	280.82 units	📈 view plot ⚠️ NO THRESHOLD	317.00 units	📈 view plot ⚠️ NO THRESHOLD	402.82 units	📈 view plot ⚠️ NO THRESHOLD	12,178.36 units
lockfreequeues_mupmuc/mpmc/1p2c							📈 view plot ⚠️ NO THRESHOLD	19,419.99 units
lockfreequeues_mupmuc/mpmc/1p4c							📈 view plot ⚠️ NO THRESHOLD	15,407.78 units
lockfreequeues_mupmuc/mpmc/2p1c							📈 view plot ⚠️ NO THRESHOLD	19,428.28 units
lockfreequeues_mupmuc/mpmc/2p2c							📈 view plot ⚠️ NO THRESHOLD	16,699.13 units
lockfreequeues_mupmuc/mpmc/2p4c							📈 view plot ⚠️ NO THRESHOLD	16,013.81 units
lockfreequeues_mupmuc/mpmc/4p1c							📈 view plot ⚠️ NO THRESHOLD	16,774.94 units
lockfreequeues_mupmuc/mpmc/4p2c							📈 view plot ⚠️ NO THRESHOLD	11,913.26 units
lockfreequeues_mupmuc/mpmc/4p4c							📈 view plot ⚠️ NO THRESHOLD	9,718.54 units
lockfreequeues_mupmuc/mpmc/8p8c							📈 view plot ⚠️ NO THRESHOLD	16,723.96 units
lockfreequeues_mupsic/mpsc/1p1c	📈 view plot ⚠️ NO THRESHOLD	284.18 units	📈 view plot ⚠️ NO THRESHOLD	321.82 units	📈 view plot ⚠️ NO THRESHOLD	449.73 units	📈 view plot ⚠️ NO THRESHOLD	31,949.25 units
lockfreequeues_mupsic/mpsc/2p1c							📈 view plot ⚠️ NO THRESHOLD	17,146.61 units
lockfreequeues_mupsic/mpsc/4p1c							📈 view plot ⚠️ NO THRESHOLD	12,744.16 units
lockfreequeues_sipmuc/mpmc/1p1c	📈 view plot ⚠️ NO THRESHOLD	265.91 units	📈 view plot ⚠️ NO THRESHOLD	315.36 units	📈 view plot ⚠️ NO THRESHOLD	410.73 units	📈 view plot ⚠️ NO THRESHOLD	12,664.55 units
lockfreequeues_sipmuc/mpmc/1p2c							📈 view plot ⚠️ NO THRESHOLD	18,599.41 units
lockfreequeues_sipmuc/mpmc/1p4c							📈 view plot ⚠️ NO THRESHOLD	13,601.47 units
lockfreequeues_sipsic/spsc/1p1c	📈 view plot ⚠️ NO THRESHOLD	386.00 units	📈 view plot ⚠️ NO THRESHOLD	455.45 units	📈 view plot ⚠️ NO THRESHOLD	583.00 units	📈 view plot ⚠️ NO THRESHOLD	9,760.42 units
lockfreequeues_unbounded_mupmuc/mpmc_unbounded/1p1c							📈 view plot ⚠️ NO THRESHOLD	10,197.96 units
lockfreequeues_unbounded_mupmuc/mpmc_unbounded/1p2c							📈 view plot ⚠️ NO THRESHOLD	4,683.84 units
lockfreequeues_unbounded_mupmuc/mpmc_unbounded/2p1c							📈 view plot ⚠️ NO THRESHOLD	9,673.90 units
lockfreequeues_unbounded_mupmuc/mpmc_unbounded/2p2c							📈 view plot ⚠️ NO THRESHOLD	8,660.52 units
lockfreequeues_unbounded_mupsic/mpsc_unbounded/1p1c							📈 view plot ⚠️ NO THRESHOLD	9,925.23 units
lockfreequeues_unbounded_mupsic/mpsc_unbounded/2p1c							📈 view plot ⚠️ NO THRESHOLD	9,564.59 units
lockfreequeues_unbounded_mupsic/mpsc_unbounded/4p1c							📈 view plot ⚠️ NO THRESHOLD	9,604.96 units
lockfreequeues_unbounded_sipmuc/mpmc_unbounded/1p1c							📈 view plot ⚠️ NO THRESHOLD	15,730.01 units
lockfreequeues_unbounded_sipmuc/mpmc_unbounded/1p2c							📈 view plot ⚠️ NO THRESHOLD	10,247.69 units
lockfreequeues_unbounded_sipsic/spsc_unbounded/1p1c							📈 view plot ⚠️ NO THRESHOLD	31,452.98 units
nim_channels/mpmc/1p1c							📈 view plot ⚠️ NO THRESHOLD	1,894.99 units
nim_channels/mpmc/1p2c							📈 view plot ⚠️ NO THRESHOLD	787.50 units
nim_channels/mpmc/1p4c							📈 view plot ⚠️ NO THRESHOLD	294.01 units
nim_channels/mpmc/2p1c							📈 view plot ⚠️ NO THRESHOLD	4,975.00 units
nim_channels/mpmc/2p2c							📈 view plot ⚠️ NO THRESHOLD	1,175.53 units
nim_channels/mpmc/2p4c							📈 view plot ⚠️ NO THRESHOLD	546.04 units
nim_channels/mpmc/4p1c							📈 view plot ⚠️ NO THRESHOLD	3,802.81 units
nim_channels/mpmc/4p2c							📈 view plot ⚠️ NO THRESHOLD	1,673.80 units
nim_channels/mpmc/4p4c							📈 view plot ⚠️ NO THRESHOLD	1,322.83 units

🐰 View full continuous benchmarking report in Bencher

merge_bmf.py rejects slugs that aren't <library>/<topology>/p<C>c (exactly 3 segments). PR #23's CI failed on the moodycamel slug: invalid slug shape 'moodycamel/ConcurrentQueue/mpmc_unbounded/1p1c' The same 4-segment violations exist for loony and crossbeam_seg_queue in bench_unbounded.nim, and the crossbeam_array_queue dispatch was emitting a colliding "crossbeam_queue" lib name. Apply the canonical lib slugs from the design doc so all dispatch sites yield 3 segments after the runner appends "/<topology>/<shape>": loony/LoonyQueue -> loony crossbeam_queue/SegQueue -> crossbeam_seg_queue crossbeam_queue (Array) -> crossbeam_array_queue moodycamel/ConcurrentQueue -> moodycamel The PR #22 worktree applies the loony and crossbeam fixes too; git rebase will harmonize the duplicate edits. The adapter `name` constants (e.g. "moodycamel/ConcurrentQueue[uint64]") are human-readable echo strings, not BMF slugs, and stay as-is.

elijahr · 2026-05-02T05:40:18Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a comprehensive benchmarking comparison suite (Comparison MVP) by adding adapters for third-party libraries including Loony, Boost.LockFree, and Crossbeam. It features a new Rust crate to expose Crossbeam via a C-ABI and integrates these libraries into the existing benchmark harness and CI workflows. Additionally, the PR implements significant performance optimizations by ensuring cache-line alignment for unbounded queue segments using a new allocAligned utility and {.align.} pragmas to eliminate false sharing. Feedback identifies a race condition in the Loony adapter's pop logic and a potential overflow issue in its item encoding scheme.

gemini-code-assist · 2026-05-02T05:42:25Z

+  proc pop*[T](a: var LoonyAdapter[T]): PopResult[T] =
+    if a.queue.isEmpty():
+      PopResult[T](success: false)
+    else:
+      let raw = a.queue.pop()
+      PopResult[T](success: true, value: T((raw shr LoonyTagShift) - 1'u64))


The pop implementation contains a race condition in multi-consumer scenarios. Checking isEmpty() before calling pop() is not atomic; another consumer thread could drain the queue between the check and the call, causing a.queue.pop() to return the default value (0). Because the adapter uses encoding to ensure valid items are never 0, it should check the result of pop() directly to determine if the queue was empty, rather than relying on a separate isEmpty() check.

proc pop*[T](a: var LoonyAdapter[T]): PopResult[T] = let raw = a.queue.pop() if raw == 0: return PopResult[T](success: false) PopResult[T](success: true, value: T((raw shr LoonyTagShift) - 1'u64))

gemini-code-assist · 2026-05-02T05:42:25Z

+    discard a
+
+  proc push*[T](a: var LoonyAdapter[T], item: T): PushResult =
+    a.queue.push((uint64(item) + 1'u64) shl LoonyTagShift)


The encoding logic (uint64(item) + 1'u64) shl LoonyTagShift will produce 0 if item is uint64.high (or any value where item + 1 overflows and then shifted). Since 0 is used by Loony to indicate an empty slot, pushing such values will break the queue's internal state or cause them to be misread as empty. While the current benchmark range (up to 1M) avoids this, the adapter should ideally handle the full range of uint64 or explicitly document this limitation.

…range Per Gemini review on PR #22. 1. Multi-consumer race in pop(): isEmpty() followed by pop() is two distinct calls. Between them, another consumer can drain the queue, leaving the second consumer to pop a default uint64 (0) and emit a spurious success with garbage payload. Switch to popping unconditionally and reading raw == 0 as the empty sentinel — the adapter's own (v+1) shl 3 encoding guarantees pushed values are never 0, so raw == 0 unambiguously means 'queue was empty when pop ran' even under concurrent draining. 2. Encoding range note in the module docstring: (uint64(item) + 1) shl 3 wraps to 0 for item == uint64.high and would be misread as an empty slot. The harness pushes uint64(i) for i in 0 ..< messageCount so the bound is academic at current tuning, but contractually the adapter accepts uint64.high - 1 as the inclusive maximum.

merge_bmf.py rejects slugs that aren't <library>/<topology>/p<C>c (exactly 3 segments). PR #23's CI failed on the moodycamel slug: invalid slug shape 'moodycamel/ConcurrentQueue/mpmc_unbounded/1p1c' The same 4-segment violations exist for loony and crossbeam_seg_queue in bench_unbounded.nim, and the crossbeam_array_queue dispatch was emitting a colliding "crossbeam_queue" lib name. Apply the canonical lib slugs from the design doc so all dispatch sites yield 3 segments after the runner appends "/<topology>/<shape>": loony/LoonyQueue -> loony crossbeam_queue/SegQueue -> crossbeam_seg_queue crossbeam_queue (Array) -> crossbeam_array_queue moodycamel/ConcurrentQueue -> moodycamel The PR #22 worktree applies the loony and crossbeam fixes too; git rebase will harmonize the duplicate edits. The adapter `name` constants (e.g. "moodycamel/ConcurrentQueue[uint64]") are human-readable echo strings, not BMF slugs, and stay as-is.

elijahr · 2026-05-02T06:12:46Z

/gemini review

gemini-code-assist · 2026-05-02T06:12:49Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

elijahr · 2026-05-02T10:13:44Z

/review

qodo-code-review · 2026-05-02T10:13:50Z

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0)

1. Missing workflow token permissions 🐞 Bug ☼ Reliability

Description

bench-comparison.yml runs bencher run --github-actions but does not declare job permissions, so
repositories with default read-only GITHUB_TOKEN permissions can fail to publish Bencher’s GitHub
checks/annotations and break the workflow’s upload stage.

Code

.github/workflows/bench-comparison.yml[R164-203]

+      - name: Install Bencher CLI
+        uses: bencherdev/bencher@main
+
+      - name: Bencher token preflight
+        run: |
+          if [ -z "$BENCHER_API_TOKEN" ]; then
+            echo "::warning title=Bencher upload skipped::BENCHER_API_TOKEN is not set; bench-comparison ran but did not upload."
+          else
+            echo "Bencher token present; proceeding with upload."
+          fi
+
+      - name: Track scheduled benchmarks with Bencher
+        if: github.event_name == 'schedule' && env.BENCHER_API_TOKEN != ''
+        run: |
+          bencher run \
+            --project lockfreequeues \
+            --token '${{ secrets.BENCHER_API_TOKEN }}' \
+            --branch devel \
+            --testbed ubuntu-latest \
+            --threshold-measure throughput \
+            --threshold-test t_test \
+            --threshold-max-sample-size 64 \
+            --threshold-lower-boundary 0.99 \
+            --thresholds-reset \
+            --adapter json \
+            --file merged.json \
+            --github-actions '${{ secrets.GITHUB_TOKEN }}' \
+            --err
+
+      - name: Track devel-push benchmarks with Bencher
+        if: github.event_name == 'push' && env.BENCHER_API_TOKEN != ''
+        run: |
+          bencher run \
+            --project lockfreequeues \
+            --token '${{ secrets.BENCHER_API_TOKEN }}' \
+            --branch devel \
+            --testbed ubuntu-latest \
+            --adapter json \
+            --file merged.json \
+            --github-actions '${{ secrets.GITHUB_TOKEN }}'

Evidence

bench-comparison.yml invokes Bencher with the GitHub Actions token, which typically requires
checks: write (and sometimes pull-requests: write) permissions. In contrast, the existing
bench.yml workflow explicitly sets those permissions for its Bencher upload job, but the new
workflow does not.

.github/workflows/bench-comparison.yml[164-214]
.github/workflows/bench.yml[331-343]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`bench-comparison.yml` calls `bencher run --github-actions ...` without setting explicit `permissions:`. On repos/orgs with default read-only token permissions, this commonly prevents check-run / PR annotation publishing and can fail the Bencher step.

### Issue Context
`bench.yml` already sets explicit permissions for its Bencher upload job; the new workflow should mirror that pattern to avoid relying on repo defaults.

### Fix Focus Areas
- .github/workflows/bench-comparison.yml[25-38]
- .github/workflows/bench-comparison.yml[164-214]
- .github/workflows/bench.yml[331-343]

### Suggested change
Add a `permissions:` block to the `bench-crossbeam` job (at minimum `checks: write`; if Bencher comments/PR annotations are desired, also `pull-requests: write`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Aligned alloc ignores alignof 🐞 Bug ≡ Correctness

Description

allocAligned[T] always requests CacheLineBytes alignment, which can be smaller than
alignof(T); if a future caller uses an over-aligned T, dereferencing the returned pointer
becomes undefined behavior.

Code

src/lockfreequeues/internal/aligned_alloc.nim[R35-48]

+proc allocAligned*[T](): ptr T =
+  ## Allocate one zero-initialized ``T`` on a ``CacheLineBytes`` boundary.
+  ##
+  ## Raises ``OutOfMemDefect`` on allocation failure (matches the existing
+  ## ``c_calloc`` failure path in unbounded queue ``newSegment`` procs).
+  ##
+  ## The caller owns the returned pointer; release with ``freeAligned`` or
+  ## the standard ``c_free`` (``posix_memalign`` blocks are compatible with
+  ## ``free`` per POSIX).
+  var p: pointer
+  if posix_memalign(addr p, csize_t(CacheLineBytes), csize_t(sizeof(T))) != 0:
+    raise newException(OutOfMemDefect, "posix_memalign failed for " & $T)
+  zeroMem(p, sizeof(T))
+  result = cast[ptr T](p)

Evidence

The allocator hard-codes the requested alignment to CacheLineBytes and does not account for the
actual alignment requirement of T. While current usage appears to target cache-line alignment, the
proc is exported and generic, so it should respect alignof(T) to remain correct as usage evolves.

src/lockfreequeues/internal/aligned_alloc.nim[28-48]
src/lockfreequeues/unbounded_sipsic.nim[23-47]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`allocAligned[T]` calls `posix_memalign(..., CacheLineBytes, ...)` for every `T`. If `alignof(T) > CacheLineBytes`, the returned pointer does not satisfy `T`’s alignment contract, which is undefined behavior when the pointer is dereferenced.

### Issue Context
Even if all current callers happen to have `alignof(T) <= CacheLineBytes`, this is a generic exported allocator and is easy to reuse elsewhere.

### Fix Focus Areas
- src/lockfreequeues/internal/aligned_alloc.nim[35-48]

### Suggested change
Compute the requested alignment as `max(CacheLineBytes, alignof(T))` (and ensure it’s a valid `posix_memalign` alignment: power-of-two and multiple of `sizeof(pointer)`). Then pass that computed alignment into `posix_memalign`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. Loony range comment incorrect 🐞 Bug ⚙ Maintainability

Description

The Loony adapter claims it supports values up to uint64.high - 1, but the (v + 1) shl 3
encoding loses the top 3 bits and is only invertible for much smaller values; this mismatch can
mislead future callers and cause silent corruption for large payloads.

Code

benchmarks/nim/adapters/loony_adapter.nim[R32-37]

+## **Encoding range.** ``push`` accepts ``T`` values in
+## ``0 .. uint64.high - 1`` (after the ``+1`` step the upper bound saturates
+## the ``uint64`` width before the ``shl 3``). ``uint64.high`` would wrap
+## to ``0`` and be misread as an empty slot. The harness pushes
+## ``uint64(i)`` for ``i in 0 ..< messageCount`` so this bound is academic
+## at current message-count tuning, but it is the contractual range.

Evidence
The comment states a near-full uint64 range, but the actual encoding multiplies by 8 via a left
shift, which discards high bits on overflow; the push/pop encoding is therefore not a bijection
across that claimed range.
benchmarks/nim/adapters/loony_adapter.nim[32-37]
benchmarks/nim/adapters/loony_adapter.nim[72-88]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The adapter documentation overstates the safe encoding domain. The current `(v + 1) shl 3` scheme is only invertible when the upper 3 bits of `(v + 1)` are zero.

### Issue Context
Today’s bench payloads are small, but the adapter is generic and the comment describes a broader contract than the code actually provides.

### Fix Focus Areas
- benchmarks/nim/adapters/loony_adapter.nim[15-37]
- benchmarks/nim/adapters/loony_adapter.nim[72-88]

### Suggested change
Update the “Encoding range” comment to the true safe range, and (optionally) add a debug-time assert in `push` that rejects values outside the safe domain (guarded so it doesn’t impact `-d:danger` bench builds).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

…range Per Gemini review on PR #22. 1. Multi-consumer race in pop(): isEmpty() followed by pop() is two distinct calls. Between them, another consumer can drain the queue, leaving the second consumer to pop a default uint64 (0) and emit a spurious success with garbage payload. Switch to popping unconditionally and reading raw == 0 as the empty sentinel — the adapter's own (v+1) shl 3 encoding guarantees pushed values are never 0, so raw == 0 unambiguously means 'queue was empty when pop ran' even under concurrent draining. 2. Encoding range note in the module docstring: (uint64(item) + 1) shl 3 wraps to 0 for item == uint64.high and would be misread as an empty slot. The harness pushes uint64(i) for i in 0 ..< messageCount so the bound is academic at current tuning, but contractually the adapter accepts uint64.high - 1 as the inclusive maximum.

merge_bmf.py rejects slugs that aren't <library>/<topology>/p<C>c (exactly 3 segments). PR #23's CI failed on the moodycamel slug: invalid slug shape 'moodycamel/ConcurrentQueue/mpmc_unbounded/1p1c' The same 4-segment violations exist for loony and crossbeam_seg_queue in bench_unbounded.nim, and the crossbeam_array_queue dispatch was emitting a colliding "crossbeam_queue" lib name. Apply the canonical lib slugs from the design doc so all dispatch sites yield 3 segments after the runner appends "/<topology>/<shape>": loony/LoonyQueue -> loony crossbeam_queue/SegQueue -> crossbeam_seg_queue crossbeam_queue (Array) -> crossbeam_array_queue moodycamel/ConcurrentQueue -> moodycamel The PR #22 worktree applies the loony and crossbeam fixes too; git rebase will harmonize the duplicate edits. The adapter `name` constants (e.g. "moodycamel/ConcurrentQueue[uint64]") are human-readable echo strings, not BMF slugs, and stay as-is.

elijahr · 2026-05-02T11:13:07Z

/review

elijahr · 2026-05-04T18:05:21Z

/ai-review

axiomantic-momus

PR adds five third-party comparison adapters (Loony, Boost.LockFree, Crossbeam), a new Rust FFI crate, a dedicated Crossbeam benchmark workflow, cache-line alignment fixes for unbounded queue segments, and supporting tests/docs. The code is well-structured, the soft-skip CI pattern is well-designed, and the alignment fix is correct. Audit demoted the unpinned-deps finding from medium to low: the affected workflow is nightly-only so it does not block PRs.

Severity tally: 3 Lows.

Low

BOT-B1 (.github/workflows/bench-comparison.yml:70): bench-comparison.yml uses main branch for sibling deps instead of tagged versions
BOT-B2 (benchmarks/rust/bench-ffi-crossbeam/Cargo.toml:1): Rust FFI crate does not set panic = "abort"; panic-unwind across FFI boundary is UB
BOT-B3 (benchmarks/nim/adapters/loony_adapter.nim:39): Loony adapter docstring incorrectly claims int.high is 'far below' the encoding-safe limit

Noteworthy

The try/finally replacement for try/except/raise in the four unbounded queue variants correctly handles Defect-class exceptions (not just CatchableError), closing a memory leak in the allocation failure path.
The crossbeam_link.nim shared module elegantly avoids duplicate {.passL.} emission when both crossbeam adapters are enabled in the same compilation unit.

Verdict: APPROVE.

Commands

Comment /ai-review or mention @axiomantic-momus[bot] to request a re-review of the latest changes.
Reply to a finding with won't fix, by design, or not a bug to decline it.
Reply with instead, ... to propose an alternative fix.

Powered by Momus running deepseek/deepseek-v4-pro via openrouter.ai.

axiomantic-momus · 2026-05-04T18:25:42Z

+          git clone --depth 1 --branch main https://github.com/elijahr/nim-debra.git
+          git clone --depth 1 --branch main https://github.com/elijahr/nim-typestates.git


BOT-B1 — Low (standards)
bench-comparison.yml uses main branch for sibling deps instead of tagged versions
bench-comparison.yml clones nim-debra and nim-typestates from --branch main, but bench.yml and build.yml both pin to --branch v0.6.0 and --branch v0.7.0 respectively. This means the crossbeam comparison workflow can silently pick up breaking changes from the dependency main branches, producing compilation failures or benchmark results that are not comparable to those from bench.yml. The PR description states this workflow "mirrors bench.yml" — the dependency pinning should mirror as well. Severity demoted from medium to low: bench-comparison.yml runs on nightly cron and devel-push only, not on PR — it does not gate merges.

Suggested change

git clone --depth 1 --branch main https://github.com/elijahr/nim-debra.git

git clone --depth 1 --branch main https://github.com/elijahr/nim-typestates.git

Replace `--branch main` with `--branch v0.6.0` for nim-debra and `--branch v0.7.0` for nim-typestates on lines 70-71 of bench-comparison.yml.

axiomantic-momus · 2026-05-04T18:25:42Z

+[package]
+name = "bench_ffi_crossbeam"
+version = "0.1.0"
+edition = "2021"
+publish = false
+description = "C-ABI cdylib wrapping crossbeam-queue::ArrayQueue and SegQueue for the lockfreequeues bench harness"
+license = "Apache-2.0 OR MIT"
+
+[lib]
+crate-type = ["cdylib", "rlib"]
+# rlib is included so `cargo test` can link the integration test binary
+# against the same code path that the cdylib exposes.
+
+[dependencies]
+crossbeam-queue = "0.3"


BOT-B2 — Low (quality)
Rust FFI crate does not set panic = "abort"; panic-unwind across FFI boundary is UB
The cdylib exposes extern "C" functions called from Nim. Neither panic = "abort" nor catch_unwind wrappers are present. In particular, cb_seg_push calls SegQueue::push which allocates via the global allocator; if the allocator panics (e.g. OOM), the unwind crosses the C FFI boundary, which is undefined behavior per Rust's FFI rules. While OOM is extremely unlikely in CI for a bench crate, this is standard FFI hygiene and low-cost to fix.

Suggested change

[package]

name = "bench_ffi_crossbeam"

version = "0.1.0"

edition = "2021"

publish = false

description = "C-ABI cdylib wrapping crossbeam-queue::ArrayQueue and SegQueue for the lockfreequeues bench harness"

license = "Apache-2.0 OR MIT"

[lib]

crate-type = ["cdylib", "rlib"]

# rlib is included so `cargo test` can link the integration test binary

# against the same code path that the cdylib exposes.

[dependencies]

crossbeam-queue = "0.3"

Add `[profile.release]` with `panic = "abort"` to Cargo.toml, or wrap the crossbeam push/pop calls in `std::panic::catch_unwind`.

axiomantic-momus · 2026-05-04T18:25:42Z

+## ``messageCount`` is bounded by ``int.high`` ≈ ``2^63``, far below the
+## safe limit at any tuning we'll ever ship, so this is academic — but
+## it is the contractual range and a debug-mode assert below makes a


BOT-B3 — Low (quality)
Loony adapter docstring incorrectly claims int.high is 'far below' the encoding-safe limit
The docstring states: "messageCount is bounded by int.high ≈ 2^63, far below the safe limit". However, the safe limit is (1'u64 shl 61) - 2 ≈ 2.3×10^18, while int.high = 2^63 - 1 ≈ 9.2×10^18 — roughly 4× larger than the limit. The actual benched message counts (500K–1M) are well within range, but the comment is factually wrong and could mislead someone tuning message counts in the future.

Suggested change

## ``messageCount`` is bounded by ``int.high`` ≈ ``2^63``, far below the

## safe limit at any tuning we'll ever ship, so this is academic — but

## it is the contractual range and a debug-mode assert below makes a

Correct the comment to reflect that actual bench message counts (e.g., millions) are far below the limit, not `int.high`.

…pects 64B PR 3 Task 3.1 (cache-line padding audit, step 1 of 3). Verifies bounded variants already conform; introduces the failing test that will drive the unbounded fix in 3.2 and 3.3. Bounded MPMC audit (read-only verification, design doc §4.2): - src/lockfreequeues/sipsic.nim:19-20 head/tail Atomic[int] {.align: CacheLineBytes.} ✓ - src/lockfreequeues/sipmuc.nim:42-43 head/tail Atomic[uint64] {.align: CacheLineBytes.} ✓ - src/lockfreequeues/mupsic.nim:42-43 head/tail Atomic[uint64] {.align: CacheLineBytes.} ✓ - src/lockfreequeues/mupmuc.nim:43-44 head/tail Atomic[uint64] {.align: CacheLineBytes.} ✓ The bounded MPMC cell array also uses the documented array-of-cells trick (typestates/mpmc_cell.nim:11) so adjacent slots never share a cache line. No bounded change required. Unbounded variants currently FAIL both audit conditions: - Segment fields (head, tail, prevConsumerIdx, committed) lack {.align: CacheLineBytes.} pragmas, so they share cache lines with neighbouring data. - newSegment() uses c_calloc, which on x86_64 Linux glibc returns 16-byte aligned memory (MALLOC_ALIGNMENT = 2*sizeof(size_t)). The {.align.} pragma controls only the intra-struct offset, not the base. Test additions: - tests/t_unbounded_padding.nim — 8 assertions across 4 unbounded variants: (a) offsetOf(SegmentField) mod CacheLineBytes == 0 for the producer/consumer coordination atomics. (b) cast[uint](segPtr) mod CacheLineBytes == 0 for a freshly-allocated Segment via the queue's allocator. - Test-only accessors guarded by `when defined(testing):` in each unbounded module: headSegmentForTest (returns pointer) and segmentHeadOffsetForTest (returns offsets of audit-relevant fields). Compiled away in release. - Wired into tests/test.nim aggregator. RED state confirmed: Unbounded queue Segment cache-line padding [Summary] 8 tests run: 0 OK, 8 FAILED off.tail mod 64 was 8 (expected 0) off.prevConsumerIdx mod 64 was 16 off.committed mod 64 was 24 cast[uint](segPtr) mod 64 was 16 Existing test suite continues to compile (verified `nim c tests/test.nim`). Refs: design doc §4.2; impl plan Task 3.1.

PR 3 Task 3.2 (cache-line padding audit, step 2 of 3). Adds the cache-line-aligned heap allocator that replaces c_calloc in the unbounded queue Segment allocation path (call sites swap in Task 3.3). Compile probe (impl plan §3.2.0): $ nim c -r tmp_probe.nim rc=0 align=0 verifying that std/posix.posix_memalign is available on macOS (libSystem) and Linux glibc with the same C signature, returning 64-byte aligned memory on success. Probe file deleted post-verification per impl plan. Implementation (src/lockfreequeues/internal/aligned_alloc.nim): - ``proc allocAligned*[T](): ptr T`` — calls posix_memalign with CacheLineBytes (64) alignment, zero-inits, raises OutOfMemDefect on rc != 0. Caller releases via the standard c_free per POSIX guarantee that posix_memalign blocks are free-compatible. - Re-exports CacheLineBytes from atomic_dsl for callers in 3.3. - isMainModule smoke confirms tail-call alignment. Tests (tests/t_aligned_alloc.nim, wired into tests/test.nim): - Tiny payload (sizeof < 64). - Line-sized payload (sizeof == 64). - Big payload (sizeof > 64). - 64 sequential allocations all 64B-aligned. All 4 cases assert pointer alignment AND zero-initialization. Result: internal/aligned_alloc.allocAligned .... [Summary] 4 tests run: 4 OK, 0 FAILED. The unbounded padding audit test (tests/t_unbounded_padding.nim) is still RED at this commit; Task 3.3 swaps the Segment allocation call sites and adds {.align: CacheLineBytes.} pragmas, taking it GREEN. Refs: design doc §4.2; impl plan Task 3.2.

PR 3 Task 3.3 (cache-line padding audit, step 3 of 3 — GREEN). Applies the cache-line padding fix to all four unbounded queue variants and their per-segment producer/consumer atomics. Resolves the asymmetry called out in design doc §4.2: bounded variants were already padded; unbounded Segments were not, so cross-library comparison numbers in PR 3 would have been biased against lockfreequeues' unbounded types. Production changes (4 files): - src/lockfreequeues/unbounded_sipsic.nim Segment.{next, head, tail} + UnboundedSipsic.{headSegment, tailSegment} gain {.align: CacheLineBytes.}. newSegment swaps c_calloc -> allocAligned. - src/lockfreequeues/unbounded_sipmuc.nim Segment.{next, tail, prevConsumerIdx} + UnboundedSipmuc.{headSegment, tailSegment} gain alignment. newSegment swaps to allocAligned. - src/lockfreequeues/unbounded_mupsic.nim Segment.{next, tail, committed} + UnboundedMupsic.{headSegment, tailSegment} gain alignment. newSegment swaps to allocAligned. ``head: int`` (single consumer, non-atomic) is intentionally left unpadded — only the consumer touches it. - src/lockfreequeues/unbounded_mupmuc.nim Segment.{next, tail, prevConsumerIdx, committed} + UnboundedMupmuc.{ headSegment, tailSegment} gain alignment. newSegment swaps to allocAligned. The ``c_calloc`` import is dropped; ``c_free`` retained for segment release (POSIX guarantees posix_memalign blocks are free-compatible). Cross-platform fix in aligned_alloc.nim: - ``std/posix.posix_memalign`` triggers a clang error under ``nim cpp`` on macOS because the SDK declares the first parameter ``__unsafe_indexable`` (incomplete-type ``void *`` vs the wrapper's ``void **``). - Replaced the ``import std/posix`` with a local ``importc, header: "<stdlib.h>"`` shim using the canonical C signature, which clang accepts in both C and C++ modes. Documented in aligned_alloc.nim header comment. Verification: - tests/t_unbounded_padding.nim: 8/8 GREEN under both nim c and nim cpp. - Full suite under nim c (orc), nim cpp, --mm:arc, --mm:refc: [Summary] 200 tests run: 200 OK, 0 FAILED, 0 SKIPPED (all four variants) - t_unbounded_*_threaded suites pass — no false-sharing-induced timing regression in the threaded code paths. Refs: design doc §4.2; impl plan Task 3.3.

PR 3 Task 3.5 (comparison MVP, first FFI adapter). Adds the Loony unbounded MPMC adapter and a parametric round-trip test in tests/t_bench_adapters.nim, both gated on -d:adapter_loony_available so the in-tree test suite passes vacuously when the package is absent. Files: - benchmarks/nim/adapters/loony_adapter.nim — adapter, topologiesSupported = {tMpmcUnbounded}, makeLoonyAdapter / push / pop / cleanup / name. - tests/t_bench_adapters.nim — push/pop 1000 uint64 round-trip preserves set; parametric template extends to subsequent adapters in 3.6–3.10. Loony storage-encoding fix (NOT a hack — required by Loony's design): Loony reserves the low 3 bits of each slot for RESUME/WRITER/READER state flags (loony/spec.nim: SLOTMASK = ~0x7). Loony's empty-slot detection reads ``(prev and SLOTMASK) == 0``, treating a zero "pointer" payload as "slot empty". Loony was designed for ref/ptr types whose addresses are 8-byte-aligned and never null. Pushing the literal value 0 (or any value whose low 3 bits are non-zero) breaks Loony's pop path. Adapter encodes the harness's ``uint64(i)`` payload as ``(v + 1) shl 3`` on push and ``(raw shr 3) - 1`` on pop: - ``shl 3`` clears the low flag bits. - ``+ 1`` ensures the encoded value is never zero. This is documentation-equivalent to how Loony users push 8-byte-aligned, non-null ref addresses; it does NOT change Loony's algorithm or affect the comparison fairness — every push/pop still goes through Loony's full fast/slow paths. First-iteration RED state (recorded for the audit): - Plain push of ``uint64(i)`` returned only 125 of 1000 items, all multiples of 8 (i.e. the values whose ``i and 7 == 0``). - ``shl 3`` alone returned 999 of 1000 items (everything except 0, which collided with Loony's empty-slot sentinel). - ``(v + 1) shl 3`` returned all 1000 items, set-equal to the input. Verification: tests/t_bench_adapters.nim with -d:adapter_loony_available + --path:loony-0.3.1 --path:arc-0.0.4 -> [Summary] 1 tests run: 1 OK. Full in-tree suite (-d:testing without the loony gate) -> 200/200 OK. Refs: design doc §3 PR 3; impl plan Task 3.5; loony/spec.nim SLOTMASK.

Boost.LockFree is C++ header-only (BSL-1.0). Adapter wraps boost::lockfree::queue<unsigned long long> with capacity-bounded push (bounded_push) and pop. Topology: tMpmc. Compile gates: -d:adapter_boost_lockfree_queue_available -- enable nim cpp -- required (else hard error) Pulls headers from the conventional install paths (apt /usr/include, brew /opt/homebrew/opt/boost/include, /usr/local/include) and accepts -d:boostIncludeDir=<path> for non-standard locations. Adds smoke_boost.nim (compile-and-run sanity for CI) and a parametric round-trip test in tests/t_bench_adapters.nim gated by both the define and 'cpp' build mode. Adapter capacity argument is the fixed node-pool size; pushes that exceed it return prFull. Default 1024 mirrors other bounded adapters.

Boost.LockFree's spsc_queue is a wait-free single-producer single-consumer ring buffer. The pop overload used here -- pop(out, max_size) -- returns the count of items popped (csize_t); 1 means success, 0 means empty. Compile gates: -d:adapter_boost_lockfree_spsc_available -- enable nim cpp -- required Topology: tSpsc. Capacity is the fixed ring size (default 1024). Header search paths and indirection rationale identical to boost_lockfree_queue adapter; smoke_boost.nim covers both adapters under one binary.

C-ABI shim around crossbeam_queue::ArrayQueue<u64> (bounded MPMC) and crossbeam_queue::SegQueue<u64> (unbounded MPMC). Exposes 8 extern "C" fns (cb_array_init/push/pop/destroy + cb_seg_init/push/pop/destroy) that the Nim adapters in tasks 3.9 / 3.10 will importc. - Cargo.toml: crate-type = [cdylib, rlib]; rlib so cargo test can link the integration test against the same code path the cdylib exports. - rust-toolchain.toml pins stable channel for reproducibility. - src/lib.rs: 8 extern fns; null-pointer safe; ArrayQueue::new(0) panic is intercepted (returns null) so we never unwind across FFI. - tests/integration_test.rs: 6 tests covering round-trip set equality for both queues, full/empty edges, and null-pointer tolerance. Build: cargo build --release -> target/release/libbench_ffi_crossbeam.{so,dylib} .gitignore extended for benchmarks/rust/*/target/ and Cargo.lock.

…lib) importc the cb_array_init/push/pop/destroy fns from benchmarks/rust/bench-ffi-crossbeam (Task 3.8) and surface them as a typed Nim adapter with topologiesSupported = {tMpmc}. The adapter avoids emitting -L/-lbench_ffi_crossbeam when its sibling crossbeam_seg_queue_adapter gate is also active, so combined builds do not produce 'ignoring duplicate libraries' warnings on Apple ld. Adds smoke_crossbeam.nim covering both crossbeam adapters under one binary (Task 3.10 lands the seg adapter); tests/t_bench_adapters.nim extended with the gated parametric round-trip suite. Compile gates: -d:adapter_crossbeam_array_queue_available -d:crossbeamLibDir=<path> (override search path for non-default builds)

…lib) importc the cb_seg_init/push/pop/destroy fns from the bench-ffi-crossbeam cdylib. SegQueue is unbounded -- push always succeeds (returns prSuccess when the handle is non-null; the underlying cb_seg_push only reports failure on null pointers). topologiesSupported = {tMpmcUnbounded}. capacity arg accepted-but-ignored so the wire-up sites (Task 3.11) can pass capacity uniformly across all adapters without a per-adapter conditional. Compile gate: -d:adapter_crossbeam_seg_queue_available Optional override: -d:crossbeamLibDir=<path>

Each MVP adapter is pulled in only when its compile-time gate is set; absent gates leave production builds unchanged. Variant lists are populated at compile time via 'when declared(...)' so the CLI's --list / unknown-variant error messages reflect the actual compiled-in set. Wiring map (per design 2.4 / Track 3 §3.11): bench_spsc: - boost_lockfree_spsc -> spsc/1p1c (gated by adapter_boost_lockfree_spsc_available) bench_mpmc: - boost_lockfree_queue -> mpmc/{1,2,4}p{1,2,4}c - crossbeam_array_queue -> mpmc/{1,2,4}p{1,2,4}c (each gated by its respective adapter_*_available define) bench_unbounded: - loony -> mpmc_unbounded/{1,2,4}p{1,2,4}c - crossbeam_seg_queue -> mpmc_unbounded/{1,2,4}p{1,2,4}c bench_mpsc has no MVP coverage today: none of the MVP libraries provide a dedicated MPSC API at the natural shape; design 2.4 is silent on MPSC parity for Track 3 and the existing in-tree mupsic baseline stays the sole MPSC slug. Verified locally: bench_mpmc + bench_unbounded produce expected crossbeam_* slugs at the small fixture (10K msg, 2 runs).

workflow_dispatch inputs force_skip_boost / force_skip_loony surface the override hook; FORCE_SKIP_BOOST / FORCE_SKIP_LOONY env vars relay them into step-level if: gates. Per library: install -> smoke -> set ADAPTER_*=1 on success -> annotate on any failure. The combined defines + compile-mode (c vs cpp) flow into the per-binary compile step via a final 'Build adapter define flags' step that populates step.outputs.{flags,mode}. Boost (libboost-dev) is wired into bench_spsc + bench_mpmc only. Loony (nimble registry) is wired into bench_unbounded only. Other binaries skip both install steps entirely (matrix-binary if-guards), so the per-binary critical path is unchanged when no MVP adapter applies. Both install steps use continue-on-error: true; failure flips the binary's compile flags so MVP slugs are simply omitted from the BMF instead of failing the workflow (design Risk 4 soft-skip pattern). Annotate-skipped emits ::warning title=Adapter skipped which surfaces on the PR check summary.

Crossbeam adds a Rust toolchain dependency (~5 min cold install) which would inflate the bench.yml critical path on every PR. Per design 2.6 and impl plan §3.13, Crossbeam runs in this dedicated workflow instead: triggers: - schedule: nightly 04:00 UTC - workflow_dispatch - push to devel that touches benchmarks/rust/** or benchmarks/nim/adapters/crossbeam_* steps: - dtolnay/rust-toolchain@stable + Swatinem/rust-cache@v2 - cargo build --release of bench-ffi-crossbeam - cargo test of the cdylib (sanity) - smoke_crossbeam.nim compile-and-run - bench_mpmc with -d:adapter_crossbeam_array_queue_available - bench_unbounded with -d:adapter_crossbeam_seg_queue_available - merge_bmf.py + Bencher upload (separate Report from bench.yml) Linker resolution: the {.passL.} flags in the Nim adapters add the default -L; we additionally pass -Wl,-rpath,$(pwd)/.../target/release so the dylib resolves at runtime without LD_LIBRARY_PATH. actionlint clean.

benchmarks/README.md: - New 'Comparison MVP' subsection: table of libraries x topologies x compile gates x install commands. - Documents the bench.yml soft-skip flow (Loony + Boost) vs bench-comparison.yml nightly cron (Crossbeam) split. - Local-run snippets for each adapter. - Mentions nim/smoke/ and rust/bench-ffi-crossbeam/ in the structure list. THIRD_PARTY_LICENSES.md (new): - Per-vendor block schema (matches design doc §4.5). - Records Loony (MIT, nimble-managed), Boost.LockFree (BSL-1.0, system package), and Crossbeam (Apache-2.0 OR MIT, Cargo-managed via the in-tree cdylib shim). - Reserves entries for concurrentqueue (PR 4) and uPlot (PR 5).

Records: five third-party adapters (Loony, Boost x2, Crossbeam x2), the bench-ffi-crossbeam Rust cdylib, smoke binaries under benchmarks/nim/smoke/, the new bench-comparison.yml workflow, the soft-skip flow added to bench.yml, the new THIRD_PARTY_LICENSES.md, the cache-line padding fix for unbounded queue segments via the new internal/aligned_alloc.nim helper, and the supporting test infrastructure (t_unbounded_padding.nim, t_aligned_alloc.nim, the t_bench_adapters.nim parametric round-trip suite).

…xonomy merge_bmf.py enforces <library>/<topology>/p<C>c (exactly 3 segments). Three call sites violated this: - "loony/LoonyQueue" produced 4-segment slugs - "crossbeam_queue/SegQueue" produced 4-segment slugs - "crossbeam_queue" for ArrayQueue collided with the SegQueue lib name Use the canonical lib slugs from the design doc ("loony", "crossbeam_array_queue", "crossbeam_seg_queue") so merge passes. Loony is soft-skipped in PR CI today and Crossbeam runs only in nightly cron, so this is preventative for downstream PRs and post-merge runs.

The cache-line padding fix swapped c_calloc -> allocAligned in newSegment but missed the manager allocations in newUnboundedSipmuc/Mupsic/Mupmuc auto-create constructors that landed via #18 (e8ee15b). Manager state (epoch counter, limbo bag pointers, registration table) benefits from the same 64-byte base alignment as Segment, and the import audit dropped c_calloc as unused — so the auto-create constructors broke at compile time. Same allocAligned helper, same OutOfMemDefect failure mode, same c_free release path (POSIX guarantees posix_memalign blocks are free-compatible).

…range Per Gemini review on PR #22. 1. Multi-consumer race in pop(): isEmpty() followed by pop() is two distinct calls. Between them, another consumer can drain the queue, leaving the second consumer to pop a default uint64 (0) and emit a spurious success with garbage payload. Switch to popping unconditionally and reading raw == 0 as the empty sentinel — the adapter's own (v+1) shl 3 encoding guarantees pushed values are never 0, so raw == 0 unambiguously means 'queue was empty when pop ran' even under concurrent draining. 2. Encoding range note in the module docstring: (uint64(item) + 1) shl 3 wraps to 0 for item == uint64.high and would be misread as an empty slot. The harness pushes uint64(i) for i in 0 ..< messageCount so the bound is academic at current tuning, but contractually the adapter accepts uint64.high - 1 as the inclusive maximum.

Three code-review fixes on PR 3: 1. allocAligned[T] always passed CacheLineBytes to posix_memalign regardless of alignof(T). For an over-aligned T (e.g. SSE/AVX vectors or {.align: 128.} pragmas), the returned pointer would not satisfy T's alignment contract — UB on dereference. Use max(CacheLineBytes, alignof(T)) so the cache-line invariant is preserved while honoring T's own alignment. Computed at compile time; runtime alignment argument stays constant per instantiation. 2. bench-comparison.yml runs bencher run --github-actions but did not declare job-level permissions. On repos with default read-only GITHUB_TOKEN, this would prevent check-run / PR-annotation publishing. Mirror bench.yml's bench-upload permissions block: contents:read, actions:read, pull-requests:write, checks:write. 3. The loony_adapter Encoding range comment claimed '0 .. uint64.high - 1' but the actual encoding (v + 1) shl 3 discards the top 3 bits, so the round-trip is only invertible for v <= (1 shl 61) - 2. Document the true safe range and add a debug-mode assert that catches misuse in non-release builds. The assert is stripped under -d:release / -d:danger so bench measurement runs see zero overhead.

aligned_alloc: - Add Windows path via _aligned_malloc / _aligned_free (POSIX path keeps posix_memalign + free). Library is no longer Linux/macOS-only. - Add freeAligned(pointer) plus a typed ptr T overload so callers in untyped destructor hooks share the same call site as typed callers. - Migrate all c_free callsites that pair with allocAligned across the four unbounded queues (segments, manager auto-create slot, retired- segment destructor) to freeAligned. Drop the now-unused c_free imports from each unbounded source. unbounded_mupsic Segment.head: - Add {.align: CacheLineBytes.} to the consumer head field. Without it, head sat in the same 64-byte chunk as the producer-shared tail and every consumer write invalidated producers' tail line. Extend the cache-line padding audit (segmentHeadOffsetForTest + t_unbounded_padding) to assert head's alignment so the regression cannot recur. manager-init failure path (mupmuc + mupsic + sipmuc auto-create overload): - Replace `try ... except: cleanup; raise` with `try ... finally: if not ok: cleanup`. Under Nim 2.0, bare `except:` matches only CatchableError -- an OutOfMemDefect from inside initDebraManager would skip cleanup and leak the heap-allocated DebraManager slot. `finally` runs unconditionally on Defect-class raises too. Refresh the comment that referred to a c_calloc zeroing path that no longer exists (allocAligned does the zeroing now). boost adapters (queue + spsc_queue): - Switch backing storage from alloc0 to allocAligned[T] / freeAligned so the placement-constructed C++ object gets cache-line alignment, not just alignof(max_align_t).

Previous design gated the array adapter's link emission on the seg adapter's gate being unset, so the seg adapter "owns" the flags when both are enabled. This breaks the realistic CI configuration where both adapter_crossbeam_*_available defines are set globally but a given bench binary imports only one adapter. With both gates set and only the array adapter imported, the array adapter's "skip" branch fires and the seg adapter's emission block is never compiled in - linker fails on undefined cb_array_* symbols. Move the {.passL.} pragmas into crossbeam_link.nim. Both adapters import it from inside their availability gate, so the link flags emit exactly once per bench binary that uses ANY crossbeam adapter, independent of which gates are set elsewhere. Nim's per-module processing dedups the pragmas when both adapters are imported.

…ect loony encoding-range docstring - bench-comparison.yml: pin nim-debra v0.6.0 / nim-typestates v0.7.0 (was tracking 'main', non-deterministic). - bench-ffi-crossbeam: add [profile.release] panic = 'abort'. Unwinding a Rust panic across the C-ABI into Nim is undefined behavior. - loony_adapter docstring: int.high (2^63 - 1) is ABOVE the 2^61 - 2 encoding ceiling, not 'far below' it. Type bound alone does not guarantee safety; reframe around the shipped tuning range and keep the debug-mode assert as the actual safeguard.

merge_bmf.py rejects slugs that aren't <library>/<topology>/p<C>c (exactly 3 segments). PR #23's CI failed on the moodycamel slug: invalid slug shape 'moodycamel/ConcurrentQueue/mpmc_unbounded/1p1c' The same 4-segment violations exist for loony and crossbeam_seg_queue in bench_unbounded.nim, and the crossbeam_array_queue dispatch was emitting a colliding "crossbeam_queue" lib name. Apply the canonical lib slugs from the design doc so all dispatch sites yield 3 segments after the runner appends "/<topology>/<shape>": loony/LoonyQueue -> loony crossbeam_queue/SegQueue -> crossbeam_seg_queue crossbeam_queue (Array) -> crossbeam_array_queue moodycamel/ConcurrentQueue -> moodycamel The PR #22 worktree applies the loony and crossbeam fixes too; git rebase will harmonize the duplicate edits. The adapter `name` constants (e.g. "moodycamel/ConcurrentQueue[uint64]") are human-readable echo strings, not BMF slugs, and stay as-is.

elijahr · 2026-05-05T04:19:54Z

Consolidated into #19 — fast-forwarded feat/bench-rollup-pr0-bench-common to this branch's tip; all commits preserved in #19. Closing this stacked PR as redundant.

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

This was referenced May 1, 2026

PR 4: Comparison expansion (Threading.Channels + Nim Channel + MoodyCamel) #23

Closed

PR 5 (Track 5, bench-rollup): Interactive uPlot charts + mkdocs deploy from devel #24

Closed

feat(bench): latency thresholds + p999/max measures (Track 6, final) #25

Closed

elijahr force-pushed the feat/bench-rollup-pr2-topology-split branch from 4615926 to 7aed4e1 Compare May 1, 2026 18:12

elijahr force-pushed the feat/bench-rollup-pr3-comparison-mvp branch from 8b7a87f to eb67e2b Compare May 1, 2026 18:13

elijahr force-pushed the feat/bench-rollup-pr3-comparison-mvp branch from eb67e2b to daf38c5 Compare May 2, 2026 02:32

elijahr force-pushed the feat/bench-rollup-pr2-topology-split branch from 466d152 to 6386498 Compare May 2, 2026 03:50

elijahr force-pushed the feat/bench-rollup-pr3-comparison-mvp branch from daf38c5 to ac54727 Compare May 2, 2026 03:51

gemini-code-assist Bot reviewed May 2, 2026

View reviewed changes

elijahr force-pushed the feat/bench-rollup-pr2-topology-split branch from 6386498 to 5012581 Compare May 2, 2026 06:00

elijahr force-pushed the feat/bench-rollup-pr3-comparison-mvp branch from aa6dd1f to fd5b154 Compare May 2, 2026 06:02

elijahr force-pushed the feat/bench-rollup-pr2-topology-split branch from 5012581 to 9585d15 Compare May 2, 2026 10:30

elijahr force-pushed the feat/bench-rollup-pr3-comparison-mvp branch from fd5b154 to 4ce6826 Compare May 2, 2026 10:33

axiomantic-momus Bot approved these changes May 4, 2026

View reviewed changes

elijahr force-pushed the feat/bench-rollup-pr2-topology-split branch from 2aa47ce to ab19733 Compare May 5, 2026 03:49

elijahr added 21 commits May 4, 2026 22:49

elijahr force-pushed the feat/bench-rollup-pr3-comparison-mvp branch from b233388 to 41a08f3 Compare May 5, 2026 03:50

ci(bench): wire t_bench_adapters into nimble benchtests

d36775b

elijahr closed this May 5, 2026

elijahr deleted the feat/bench-rollup-pr3-comparison-mvp branch May 5, 2026 04:19

		git clone --depth 1 --branch main https://github.com/elijahr/nim-debra.git
		git clone --depth 1 --branch main https://github.com/elijahr/nim-typestates.git

	git clone --depth 1 --branch main https://github.com/elijahr/nim-debra.git
	git clone --depth 1 --branch main https://github.com/elijahr/nim-typestates.git
	Replace `--branch main` with `--branch v0.6.0` for nim-debra and `--branch v0.7.0` for nim-typestates on lines 70-71 of bench-comparison.yml.

Conversation

elijahr commented May 1, 2026

Summary

What's in this PR

Comparison MVP adapters (Track 3 §3.5–3.10)

New Rust crate benchmarks/rust/bench-ffi-crossbeam/

CI integration

Cache-line padding audit (Track 3 §3.1–3.3)

Documentation

Verification

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bencher Report

⚠️ WARNING: No Threshold found!

Uh oh!

elijahr commented May 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

elijahr commented May 2, 2026

Uh oh!

gemini-code-assist Bot commented May 2, 2026

Uh oh!

elijahr commented May 2, 2026

Uh oh!

qodo-code-review Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

elijahr commented May 2, 2026

Uh oh!

elijahr commented May 4, 2026

Uh oh!

axiomantic-momus Bot left a comment

Choose a reason for hiding this comment

Low

Noteworthy

Uh oh!

axiomantic-momus Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

axiomantic-momus Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

axiomantic-momus Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

elijahr commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New Rust crate `benchmarks/rust/bench-ffi-crossbeam/`

github-actions Bot commented May 1, 2026 •

edited

Loading

qodo-code-review Bot commented May 2, 2026 •

edited

Loading