elijahr · elijahr · May 5, 2026 · May 1, 2026 · May 1, 2026 · May 1, 2026
diff --git a/.github/workflows/bench.yml b/.github/workflows/bench.yml
@@ -76,11 +76,6 @@ jobs:
       - name: Vendor unittest2
         run: git clone --depth 1 https://github.com/status-im/nim-unittest2.git deps/unittest2
 
-      - name: Run adapter unit tests
-        # Cheap pre-flight: fail fast if the parser itself is broken before
-        # spending minutes compiling and running the bench binary.
-        run: python3 benchmarks/test_bmf_adapter.py -v
-
       - name: Compile bench_throughput (CI-tuned run shape)
         # Cloud runs use a tighter wall-clock budget than the local default
         # (1M messages × 33 runs). Bounded variants (sipsic, channels) use
@@ -113,14 +108,23 @@ jobs:
         # Step-level timeout: if any single variant hangs, fail fast inside
         # the 30-min job budget so Bencher upload steps still run (or are
         # visibly skipped) instead of burning the entire budget on one variant.
+        # `--bmf-out=throughput.json` emits Bencher Metric Format JSON
+        # natively (Task 0.10); the legacy stdout->bmf_adapter.py regex
+        # parser is gone (Task 0.12).
         timeout-minutes: 20
-        run: ./.tmp/bench_throughput sipsic mupmuc unbounded_mupsic channels | tee bench_output.txt
+        run: |
+          ./.tmp/bench_throughput --bmf-out=throughput.json \
+            sipsic mupmuc unbounded_mupsic channels | tee bench_output.txt
 
-      - name: Convert to BMF JSON
-        run: python3 benchmarks/bmf_adapter.py bench_output.txt bench_results.json
+      - name: Merge BMF JSON
+        # Single-input merge today, but the bench-rollup feature splits
+        # bench_throughput into per-topology binaries in PR 2-4 each
+        # producing its own throughput.json fragment. Calling merge_bmf.py
+        # now keeps the workflow unchanged when those fragments arrive.
+        run: python3 benchmarks/merge_bmf.py merged.json throughput.json
 
       - name: Show BMF JSON (debug)
-        run: cat bench_results.json
+        run: cat merged.json
 
       - name: Install Bencher CLI
         uses: bencherdev/bencher@main
@@ -132,7 +136,7 @@ jobs:
         # silently appearing to "do nothing" with the data.
         run: |
           if [ -z "$BENCHER_API_TOKEN" ]; then
-            echo "::warning title=Bencher upload skipped::BENCHER_API_TOKEN secret is not set on this repo. The bench ran successfully and produced bench_results.json (visible in the 'Show BMF JSON (debug)' step), but no data is being uploaded to Bencher.dev. Set up the project at bencher.dev with slug 'lockfreequeues' and add BENCHER_API_TOKEN as a repo secret to enable upload."
+            echo "::warning title=Bencher upload skipped::BENCHER_API_TOKEN secret is not set on this repo. The bench ran successfully and produced merged.json (visible in the 'Show BMF JSON (debug)' step), but no data is being uploaded to Bencher.dev. Set up the project at bencher.dev with slug 'lockfreequeues' and add BENCHER_API_TOKEN as a repo secret to enable upload."
           else
             echo "Bencher token present; proceeding with upload."
           fi
@@ -151,7 +155,7 @@ jobs:
             --start-point-reset \
             --testbed ubuntu-latest \
             --adapter json \
-            --file bench_results.json \
+            --file merged.json \
             --github-actions '${{ secrets.GITHUB_TOKEN }}' \
             --err
 
@@ -170,7 +174,7 @@ jobs:
             --threshold-lower-boundary 0.99 \
             --thresholds-reset \
             --adapter json \
-            --file bench_results.json \
+            --file merged.json \
             --github-actions '${{ secrets.GITHUB_TOKEN }}' \
             --err
 
@@ -184,4 +188,4 @@ jobs:
             --branch "${GITHUB_REF##*/}" \
             --testbed ubuntu-latest \
             --adapter json \
-            --file bench_results.json
+            --file merged.json
diff --git a/.gitignore b/.gitignore
@@ -32,7 +32,6 @@ nim-unittest2/
 logs/
 test_typed_introspection*
 benchmarks/nim/bench_latency
-benchmarks/nim/bench_main
 benchmarks/nim/bench_throughput
 
 # Compiled benchmark test binaries (extensionless executables)

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,83 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Added
+
+- New `benchmarks/nim/bench_common.nim` shared harness module exporting:
+  `Topology` enum, `BMFEmitter` (alpha-sorted Bencher Metric Format JSON
+  emission), `Histogram` (min-heap top-K + Algorithm R reservoir for
+  stratified-percentile estimation, p99 within 1% of sort fallback on
+  100k log-normal samples), generic `runThroughputHarness` and
+  `runLatencyHarness` (1P/1C ping-pong RTT with monotonic-ns timing and
+  per-run percentile aggregation), and Stats helpers (mean / stddev /
+  minVal / maxVal / linear-interpolation percentile).
+- Five new lockfreequeues adapters in `benchmarks/nim/adapters/`:
+  `lockfreequeues_sipmuc_adapter.nim`, `lockfreequeues_mupsic_adapter.nim`,
+  `lockfreequeues_unbounded_sipsic_adapter.nim`,
+  `lockfreequeues_unbounded_sipmuc_adapter.nim`,
+  `lockfreequeues_unbounded_mupmuc_adapter.nim`. Each exposes
+  `topologiesSupported: set[Topology]` and the standard `push`/`pop`
+  shape consumed by the shared harness. The unbounded adapters store
+  the queue inline (not via `ptr`) to dodge a Nim 2.2.6 codegen bug
+  triggered by generic-pointer destructor calls when bench_common is
+  imported.
+- New `benchmarks/merge_bmf.py` CLI: stateless union of per-binary BMF
+  JSON fragments into a single output file. Exits 1 on `(slug, measure)`
+  collisions naming both colliding inputs in stderr. Output slugs and
+  measures alpha-sorted. Pure-stdlib (no third-party deps); covered by
+  `benchmarks/tests/test_merge_bmf.py` (10 tests).
+- `bench_throughput` `--bmf-out=<path>` flag emits Bencher Metric Format
+  JSON natively. The flag is purely additive: with the flag absent, the
+  binary is bit-for-bit unchanged from the prior release (same stdout
+  text, same positional CLI: `bench_throughput sipsic mupmuc
+  unbounded_mupsic channels`). Emitted slugs:
+  `lockfreequeues_sipsic/spsc/1p1c`,
+  `lockfreequeues_mupmuc/mpmc/{1,2,4,8}p{1,2,4,8}c`,
+  `lockfreequeues_unbounded_mupsic/mpsc_unbounded/{1,2,4}p1c`,
+  `nim_channels/mpmc/{1,2,4}p{1,2,4}c`. Each carries a
+  `throughput_ops_ms` measure with `value=mean`, `lower_value=mean-stddev`,
+  `upper_value=mean+stddev`.
+- Per-variant compile-time run-count overrides:
+  `-d:BenchSipsicRuns=N`, `-d:BenchSipsicWarmup=N`,
+  `-d:BenchMupmucRuns=N`, `-d:BenchMupmucWarmup=N`,
+  `-d:BenchChannelsRuns=N`, `-d:BenchChannelsWarmup=N`. Defaults match
+  the prior hard-coded `runs = 10`, so production runs are unchanged.
+
+### Changed
+
+- `bench_throughput.nim` now natively emits Bencher Metric Format JSON
+  via `--bmf-out=<path>`. The CI workflow (`.github/workflows/bench.yml`)
+  was rewired to consume the native output and feed it through
+  `merge_bmf.py` before uploading to Bencher.dev — the previous Python
+  regex parser (`bmf_adapter.py`) is gone.
+- The four existing lockfreequeues adapter files renamed to the
+  canonical `<library_slug>_adapter.nim` convention with `git mv`
+  (history preserved): `lockfreequeues_sipsic.nim`,
+  `lockfreequeues_mupmuc.nim`, `lockfreequeues_unbounded_mupsic.nim`.
+  Each gained a `topologiesSupported: set[Topology]` constant for the
+  upcoming PR 3 binary-split.
+- `benchmarks/render_readme.nim` rewritten to consume the new BMF JSON
+  shape directly (`{slug: {measure: MeasureValue}}`) instead of the
+  legacy `bench_main` aggregator output. The slug walk decomposes
+  `<lib>/<topology>/<P>p<C>c` back into the (impl, thread_config) pair
+  the table renders.
+- `benchmarks/runner.py` and `lockfreequeues.nimble` `task benchmarks`
+  redirected from `bench_main` to `bench_throughput --bmf-out=<path>`.
+- `benchmarks/README.md` rewritten to document the new flow
+  (bench_common module, adapter convention, `--bmf-out` flag,
+  merge_bmf.py, expected slug set).
+
+### Removed
+
+- `benchmarks/bmf_adapter.py` — Python regex parser that converted
+  `bench_throughput` stdout text into BMF JSON. Replaced by native BMF
+  emission via `--bmf-out=`.
+- `benchmarks/test_bmf_adapter.py` — unit tests for the parser.
+  Replaced by `benchmarks/tests/test_merge_bmf.py`.
+- `benchmarks/nim/bench_main.nim` — aggregator binary that wrapped
+  bench_throughput + bench_latency and produced a custom JSON shape.
+  `bench_throughput` is now the canonical entry point.
+
 ## [4.1.0] - 2026-05-01
 
 ### Added

diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -6,26 +6,36 @@ regression gate via [Bencher.dev](https://bencher.dev).
 ## Structure
 
 - `nim/` - Nim benchmarks (lockfreequeues, Loony, Nim channels)
+- `nim/bench_common.nim` - Shared bench harness (BMF emission, stats,
+  Histogram with top-K + reservoir percentiles, throughput / latency
+  runners). One module, consumed by every per-topology bench binary.
+- `nim/bench_throughput.nim` - Throughput driver. Emits Bencher Metric
+  Format JSON natively via `--bmf-out=<path>`.
+- `nim/adapters/` - One file per upstream queue library
+  (`<library_slug>_adapter.nim`). Adapters expose a `push(value)
+  -> PushResult` / `pop() -> PopResult[T]` shape consumed by the
+  shared harness.
+- `merge_bmf.py` - Stateless union of per-binary BMF JSON fragments
+  into a single `merged.json` for `bencher run`. Exits 1 on
+  `(slug, measure)` collisions naming the colliding inputs.
 - `results/` - JSON output from local benchmark runs
 - `runner.py` - Orchestrates local benchmark execution
-- `bmf_adapter.py` - Converts `bench_throughput` stdout to Bencher Metric Format
-- `test_bmf_adapter.py` - Unit tests for the BMF adapter
 
 ## Quick Start (local)
 
 ```bash
-# Run all Nim throughput benchmarks (1M messages × 33 runs - takes a while).
+# Run all Nim throughput benchmarks (1M messages x 33 runs - takes a while).
 nim c -r -d:release -d:danger --threads:on benchmarks/nim/bench_throughput.nim
 
-# Same, but the CI wall-clock budget (100k × 5).
+# Same, but the CI wall-clock budget (100k x 5).
 nim c -r -d:release -d:danger --threads:on \
   -d:MessageCount=100000 -d:DefaultRuns=5 -d:WarmupRuns=2 \
   -d:UnboundedMupsicRuns=5 \
   benchmarks/nim/bench_throughput.nim
 
-# Convert the captured stdout to BMF JSON for upload / inspection.
-./.tmp/bench_throughput > bench_output.txt
-python3 benchmarks/bmf_adapter.py bench_output.txt bench_results.json
+# Emit BMF JSON natively (no Python parser; see merge step below).
+./.tmp/bench_throughput --bmf-out=throughput.json
+python3 benchmarks/merge_bmf.py merged.json throughput.json
 ```
 
 ## Metrics
@@ -40,13 +50,16 @@ python3 benchmarks/bmf_adapter.py bench_output.txt bench_results.json
 for every PR and every push to `main`/`devel`. The workflow:
 
 1. Compiles `bench_throughput` with the CI run shape
-   (`-d:MessageCount=100000 -d:DefaultRuns=5 -d:WarmupRuns=2
-   -d:UnboundedMupsicRuns=5`).
-2. Captures stdout to `bench_output.txt`.
-3. Runs `bmf_adapter.py` to emit `bench_results.json` in
-   [Bencher Metric Format](https://bencher.dev/docs/reference/bencher-metric-format/).
-4. Uploads the JSON to the `lockfreequeues` Bencher project via the
-   `bencherdev/bencher@main` action.
+   (`-d:MessageCount=1000000 -d:DefaultRuns=5 -d:WarmupRuns=2
+   -d:UnboundedMupsicRuns=3 -d:UnboundedMupsicMessageCount=500000`).
+2. Runs `bench_throughput --bmf-out=throughput.json`, which writes
+   Bencher Metric Format JSON natively.
+3. Runs `python3 benchmarks/merge_bmf.py merged.json throughput.json`
+   to produce a single `merged.json` for upload. The merge step is a
+   no-op union today, but stays in place for the per-topology binary
+   split landing in PR 2-4.
+4. Uploads `merged.json` to the `lockfreequeues` Bencher project via
+   the `bencherdev/bencher@main` action.
 
 On pull requests, Bencher posts a comparison comment against the base
 branch using `--start-point-clone-thresholds` and `--start-point-reset`,
@@ -64,35 +77,40 @@ The cloud workflow requires:
    with write access to the project.
 
 Until those exist the `bench` workflow will fail on the upload step;
-PR / push events still produce the `bench_results.json` artifact in the
+PR / push events still produce the `merged.json` artifact in the
 job log so local debugging is possible without the upload.
 
 ### BMF schema emitted
 
 ```json
 {
-  "<variant>/<P>p<C>c": {
-    "throughput": {
+  "<library_slug>/<topology>/<P>p<C>c": {
+    "throughput_ops_ms": {
       "value": <mean ops/ms>,
-      "lower_value": <min ops/ms, optional>,
-      "upper_value": <max ops/ms, optional>
+      "lower_value": <mean - stddev>,
+      "upper_value": <mean + stddev>
     }
   }
 }
 ```
 
-`<variant>` is one of `sipsic`, `mupmuc`, `unbounded_mupsic`, `channels`.
-`lower_value` / `upper_value` are populated only for blocks that print a
-`min: ... max: ...` line (currently only the `unbounded_mupsic` group).
-Non-finite samples (`inf`, `nan`) are dropped with a stderr warning so
-spurious cold-cache outliers do not poison the upload.
+Slugs are alpha-sorted at the top level and measures are alpha-sorted
+within each slug. `lower_value` / `upper_value` are omitted when the
+emitter receives `NaN` sentinels for the bounds. Current slug set
+emitted by `bench_throughput`:
 
-## Running adapter tests
+- `lockfreequeues_sipsic/spsc/1p1c`
+- `lockfreequeues_mupmuc/mpmc/{1,2,4,8}p{1,2,4,8}c`
+- `lockfreequeues_unbounded_mupsic/mpsc_unbounded/{1,2,4}p1c`
+- `nim_channels/mpmc/{1,2,4}p{1,2,4}c`
+
+## Running merge_bmf tests
 
 ```bash
-python3 benchmarks/test_bmf_adapter.py -v
+python3 -m unittest benchmarks.tests.test_merge_bmf -v
 ```
 
 The tests use only the Python standard library (`unittest`) and run in
-< 0.1s. They cover full and partial bench output, unknown variants, and
-the CLI's exit codes.
+< 0.1s. They cover slug regex enforcement, measure regex enforcement,
+collision detection (with both colliding files named in stderr), and
+alpha-sorted output.