Skip to content

Commit dac7377

Browse files
author
Matt Davis
committed
[bench/cli/benchmarks/reference] Wire bench infra + real CLI + reference data
bench/ infrastructure: - src/bionpu/bench/{harness,runner,units,__init__}.py — measurement harness, generic units conversions, top-level runner. - src/bionpu/bench/POWER_DOMAINS.md — per-device power-domain spec (CPU RAPL, GPU nvidia-smi, NPU xrt-smi). Includes/excludes, sampling rates, sources, known issues, cross-compare caveats. Mechanically lint-checked front-matter. - src/bionpu/bench/UNITS.md — units convention (J vs Wh vs J/Mbp), measurement passport schema. - src/bionpu/bench/energy/{rapl,nvsmi,xrt,__init__}.py — three per-device energy readers. - src/bionpu/bench/energy/SANITY-LOG.md — calibration evidence log. docs/ENERGY_METHODOLOGY.md: Real public-facing methodology document (was a placeholder). Covers: - TL;DR table of per-device counter sources, includes, excludes, sampling rates. - Three-phase sustained-load measurement shape (pre-warmup, measurement window, drift-detection window). - Spec-bracketing assumptions (TDP envelope as a sanity gate, not a target). - UNAVAILABLE-counter rules (the harness MUST NEVER fabricate a reading). - Reproducibility envelope. - Pointers into the deeper docs in src/bionpu/bench/. CLI: - src/bionpu/cli.py — real argparse with working 'verify {crispr, basecalling}' subcommands wired through to bionpu.verify. Exits 0 on byte-equality, 1 on divergence. - 'scan', 'basecall', 'bench' subcommands stubbed as v0.2 scope with a clear stderr message pointing at the kernel sources. - Smoke-tested end-to-end: 'bionpu verify crispr ref.tsv ref.tsv' on the 422-row Cas-OFFinder canonical reference returns EQUAL with matching SHA-256s. benchmarks/: - benchmarks/crispr/run_chr.sh — skeleton with the v0.2 driver shape documented; v0.1 instructs running kernels manually + calling 'bionpu verify crispr'. - benchmarks/basecalling/run_pod5.sh — same shape for basecalling. reference/: - reference/crispr/casoffinder-canonical.tsv — 422-row canonical reference (small enough to ship; cas-offinder is BSD-2-Clause). - reference/crispr/casoffinder-chr22-10guides.tsv — chr22 fixture. - reference/basecalling/bionpu-reference.fastq — small smoke FASTQ. All 18 verify-harness tests still pass.
1 parent 4c3b391 commit dac7377

19 files changed

Lines changed: 35755 additions & 42 deletions

benchmarks/basecalling/run_pod5.sh

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
#!/usr/bin/env bash
2+
# bionpu — AIE2P-accelerated genomics with reference-equivalence verification.
3+
# Copyright (C) 2026 OpenSensor / Matt Davis <matt@opensensor.io>
4+
# SPDX-License-Identifier: GPL-3.0-only
5+
#
6+
# Basecall a pod5 read set on the NPU and verify byte-equality against
7+
# a Dorado reference FASTQ.
8+
#
9+
# Usage:
10+
# benchmarks/basecalling/run_pod5.sh <pod5_path>
11+
#
12+
# Status: v0.1 ships a skeleton — the NPU basecalling pipeline lives in
13+
# the bionpu source tree (src/bionpu/kernels/basecalling/) but the CLI
14+
# wrapper that drives the full streaming pipeline is v0.2 scope. For
15+
# v0.1 the supported flow is to run the per-kernel make targets
16+
# manually and then verify byte-equality with `bionpu verify
17+
# basecalling`.
18+
19+
set -euo pipefail
20+
21+
POD5="${1:-}"
22+
OUT_DIR="benchmarks/results/basecalling/$(basename "${POD5%.pod5}" 2>/dev/null || echo unknown)"
23+
24+
if [[ -z "${POD5}" ]]; then
25+
cat <<EOF
26+
Usage: $0 <pod5_path>
27+
28+
pod5_path Path to a pod5 read set (Nanopore raw signal).
29+
30+
Pre-computed reference FASTQs are at reference/basecalling/.
31+
EOF
32+
exit 1
33+
fi
34+
35+
REF_FASTQ="reference/basecalling/dorado-reference.fastq"
36+
NPU_FASTQ="${OUT_DIR}/npu.fastq"
37+
38+
mkdir -p "${OUT_DIR}"
39+
40+
echo "==> $0 ${POD5}"
41+
echo " output dir: ${OUT_DIR}"
42+
echo " reference: ${REF_FASTQ}"
43+
44+
cat <<EOF
45+
46+
[v0.1 placeholder]
47+
The end-to-end driver is v0.2 scope. For v0.1, run the kernels
48+
manually and then call:
49+
50+
bionpu verify basecalling "${NPU_FASTQ}" "${REF_FASTQ}"
51+
52+
The kernels live at:
53+
src/bionpu/kernels/basecalling/{conv_stem,lstm_cell_*,linear_projection,...}
54+
55+
The Dorado reference FASTQ is committed at reference/basecalling/ when
56+
it has been generated on a host with Dorado available (the build is
57+
not redistributable; see Dorado's license).
58+
EOF

benchmarks/crispr/run_chr.sh

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
#!/usr/bin/env bash
2+
# bionpu — AIE2P-accelerated genomics with reference-equivalence verification.
3+
# Copyright (C) 2026 OpenSensor / Matt Davis <matt@opensensor.io>
4+
# SPDX-License-Identifier: GPL-3.0-only
5+
#
6+
# Run a CRISPR off-target scan against a target chromosome and verify
7+
# byte-equality against a Cas-OFFinder reference.
8+
#
9+
# Usage:
10+
# benchmarks/crispr/run_chr.sh <chr> [<guides_file>]
11+
#
12+
# Example:
13+
# benchmarks/crispr/run_chr.sh chr22 reference/crispr/guides_chr22.txt
14+
#
15+
# Status: v0.1 ships a skeleton — the NPU scan invocation lives in the
16+
# bionpu source tree (src/bionpu/kernels/crispr/) but the CLI wrapper
17+
# that drives a full chromosome scan from the command line is v0.2
18+
# scope. For v0.1 the supported flow is to run the per-kernel make
19+
# target manually and then verify byte-equality with `bionpu verify
20+
# crispr`.
21+
22+
set -euo pipefail
23+
24+
CHR="${1:-}"
25+
GUIDES="${2:-reference/crispr/guides_${CHR}.txt}"
26+
OUT_DIR="benchmarks/results/crispr/${CHR}"
27+
28+
if [[ -z "${CHR}" ]]; then
29+
cat <<EOF
30+
Usage: $0 <chr> [<guides_file>]
31+
32+
chr Target chromosome (chr1 ... chr22, chrX, chrY).
33+
guides_file Newline-separated list of 20-nt guide spacers.
34+
Defaults to reference/crispr/guides_<chr>.txt.
35+
36+
Example:
37+
$0 chr22
38+
39+
End-to-end pipeline (v0.2 scope; v0.1 ships a manual workflow):
40+
1. Scan ${CHR} with the NPU PAM filter + match kernel.
41+
2. Run cas-offinder on the same input as the CPU reference.
42+
3. bionpu verify crispr <npu.tsv> <ref.tsv>
43+
EOF
44+
exit 1
45+
fi
46+
47+
REF_TSV="reference/crispr/casoffinder-${CHR}-canonical.tsv"
48+
NPU_TSV="${OUT_DIR}/npu.tsv"
49+
50+
mkdir -p "${OUT_DIR}"
51+
52+
echo "==> $0 ${CHR}"
53+
echo " guides: ${GUIDES}"
54+
echo " reference TSV: ${REF_TSV}"
55+
echo " output dir: ${OUT_DIR}"
56+
57+
cat <<EOF
58+
59+
[v0.1 placeholder]
60+
The end-to-end driver is v0.2 scope. For v0.1, run the kernels
61+
manually (one-time, ~30 s build) and then call:
62+
63+
bionpu verify crispr "${NPU_TSV}" "${REF_TSV}"
64+
65+
The verify command exits 0 on byte-equality and 1 on divergence.
66+
67+
The kernels live at:
68+
src/bionpu/kernels/crispr/{pam_filter,match_multitile_memtile,...}
69+
70+
The CPU reference is built from cas-offinder; pre-computed canonical
71+
TSVs are at reference/crispr/.
72+
73+
When the v0.2 driver lands, this script will:
74+
1. Build (or use cached) NPU artifacts for the kernels.
75+
2. Dispatch the scan against \${CHR} via bionpu.dispatch.
76+
3. Run cas-offinder on the same input.
77+
4. Call bionpu verify crispr ... and exit with its return code.
78+
EOF

docs/ENERGY_METHODOLOGY.md

Lines changed: 108 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,110 @@
11
# Energy methodology
22

3-
> Status: shell — populated from `bionpu/bench/POWER_DOMAINS.md` and
4-
> `bionpu/bench/energy/SANITY-LOG.md` during the v0.1 extraction. Until
5-
> filled, the v0.1 `bench` numbers in this repo should be treated as
6-
> wall-clock only.
7-
8-
This document will cover:
9-
10-
- AMD RAPL counter access path (`/sys/class/powercap/intel-rapl:*` on
11-
Ryzen-AI HX systems; the AMD-specific `package-0` / `package-1`
12-
domain layout).
13-
- Sustained-load measurement — pre-warmup window, measurement window,
14-
drift-detection window — the three-phase shape that distinguishes
15-
steady-state energy from cold-start spikes.
16-
- Spec-bracketing assumptions — what TDP range we assume the package
17-
is in, how we cross-check against the documented Ryzen-AI 9 HX SKU
18-
TDP envelope, where the assumption fails.
19-
- NPU-specific power accounting — what's measurable today vs what is
20-
inferred from the package counter delta with NPU idle vs NPU active.
21-
- Reproducibility envelope — what hardware revisions / firmware
22-
versions / governor settings the documented numbers are valid for.
3+
This document is the public-facing methodology for the energy figures
4+
reported in `benchmarks/results/`. It exists because cross-device
5+
energy comparisons (CPU vs GPU vs NPU joules-per-Mbp / joules-per-scan)
6+
are easy to misuse — every device's "energy" is a different rail with
7+
different includes, different sampling rates, and different known
8+
instrumentation gaps. We document those explicitly so a reader can
9+
decide whether the comparison is honest.
10+
11+
## TL;DR
12+
13+
| Device | Counter source | Includes | Excludes | Sampling |
14+
|---|---|---|---|---|
15+
| **CPU** | `/sys/class/powercap/{intel-rapl,amd-rapl-msr}:0/energy_uj` (RAPL) | All P-cores + E-cores + L3 / uncore on Zen 5 package | DRAM (no separate AMD RAPL DRAM domain), discrete GPU, NPU subdomain, platform IO | ≥10 Hz, monotonic counter integrated start-to-end |
16+
| **GPU** | `nvidia-smi --query-gpu=power.draw,total_energy_consumption` | Compute cores + GDDR/HBM memory + PCIe interface (board side) + VRMs | Host CPU, host DRAM, NPU | ~1 Hz (driver-reported); prefer driver-integrated `total_energy_consumption` over trapezoidal-integrated `power.draw` |
17+
| **NPU** | `xrt-smi examine -r platform` (firmware-internal estimate) | AIE compute tiles in the active hardware-context partition | Host SoC package (CPU; on the RAPL rail), host DRAM, Radeon iGPU on the same package, platform IO outside the AIE partition | 10 Hz (capped by ~40 ms `xrt-smi` invocation cost); trapezoidal-integrated to joules |
18+
19+
A figure caption that compares any two of these without listing the
20+
includes / excludes is not honest enough to publish.
21+
22+
## Reference documents
23+
24+
The full methodology lives in three places in this repo:
25+
26+
1. **[`src/bionpu/bench/POWER_DOMAINS.md`](../src/bionpu/bench/POWER_DOMAINS.md)**
27+
— exhaustive per-device specification: rail name, target hardware,
28+
includes / excludes, sampling rate, source path, fallback source,
29+
known issues, cross-compare caveats. Front-matter is mechanically
30+
lint-checked so every device entry is fully populated.
31+
32+
2. **[`src/bionpu/bench/energy/SANITY-LOG.md`](../src/bionpu/bench/energy/SANITY-LOG.md)**
33+
— the calibration log. Records the host system the numbers were
34+
measured on, the kernel + module versions, the probe results
35+
(which counters are AVAILABLE / UNAVAILABLE on this host), and
36+
the resolution paths for the UNAVAILABLE cases. Future calibration
37+
runs append; never overwrite.
38+
39+
3. **[`src/bionpu/bench/UNITS.md`](../src/bionpu/bench/UNITS.md)**
40+
units convention (J vs Wh vs J/Mbp), measurement passport schema,
41+
and the rules for combining same-rail / cross-rail figures.
42+
43+
## Sustained-load measurement
44+
45+
Every benchmark in `benchmarks/` measures energy across three
46+
windows, in this order:
47+
48+
1. **Pre-warmup** (`pre_warmup_seconds`, default 10 s): host runs the
49+
workload at full duty cycle to bring caches, governors, NPU
50+
firmware, and GPU clocks to their steady-state. Energy in this
51+
window is **not** counted.
52+
53+
2. **Measurement** (`measurement_seconds`, default 30 s): the actual
54+
integration window. The energy counter is sampled at the start
55+
boundary, sampled again at the end boundary, and sampled at
56+
least once mid-window to detect counter wraparound.
57+
58+
3. **Drift-detection** (`drift_seconds`, default 5 s): a
59+
final-window sample taken `drift_seconds` after the measurement
60+
window ends. If the per-second power in the drift window deviates
61+
from the measurement window by > drift threshold (default 5 %),
62+
the measurement passport flags the run as `drift_detected: true`
63+
and the published number is the measurement-window value with a
64+
drift-warning annotation.
65+
66+
This three-phase shape distinguishes steady-state energy from
67+
cold-start spikes; almost all "the NPU uses X joules" claims that
68+
report a single-shot wall-time figure are conflating the warmup
69+
transient with the steady-state, sometimes by a factor of 2-3×.
70+
71+
## Spec-bracketing assumptions
72+
73+
The published energy numbers are reported alongside the
74+
manufacturer-spec TDP envelope of the device's silicon, so a reader
75+
can check whether the measurement falls in a plausible range:
76+
77+
- **CPU** — Ryzen AI 9 HX nominal 28-54 W TDP envelope.
78+
- **GPU** — per-board TGP from the OEM, recorded per run in the
79+
measurement passport.
80+
- **NPU** — AIE2P partition at sustained load typically falls in
81+
the 1.5-3.5 W range; published measurements outside this band
82+
are flagged as out-of-spec and require a calibration entry in
83+
`SANITY-LOG.md` before publication.
84+
85+
The spec envelope is a sanity gate, not a target. A measurement that
86+
falls in-band is not automatically valid; a measurement that falls
87+
out-of-band is not automatically wrong (silicon binning,
88+
firmware-state, or governor changes can move the steady-state
89+
envelope by ±20 %). The bracket is published so readers can
90+
challenge the number.
91+
92+
## When a counter is UNAVAILABLE
93+
94+
Per the rules in `POWER_DOMAINS.md`, if any counter probe fails
95+
(permission denied, sysfs path missing, driver too old) the harness
96+
emits a measurements record with that device's energy field set to
97+
`null` and a `reason_unavailable` string. The harness MUST NEVER
98+
fabricate a reading. A run with an UNAVAILABLE counter still records
99+
wall-time; the published comparison just drops that device from the
100+
energy column with a footnote pointing at the sanity-log entry that
101+
explains why.
102+
103+
## Reproducibility envelope
104+
105+
The numbers in `benchmarks/results/` are valid for the host
106+
configuration recorded at the head of `SANITY-LOG.md`. A different
107+
host (different kernel, different driver, different governor) is a
108+
different measurement. We do not ship "expected energy" thresholds
109+
that other hosts must hit; we ship the reproducible measurement
110+
**method** so other hosts can produce their own numbers.
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
@8a391d95-9203-5f59-b83f-220eed61908d
2+
GA
3+
+
4+
!!
5+
@17548974-2486-5717-afde-9739ce6fa468
6+
AAAAA
7+
+
8+
!!!!!
9+
@61b9e826-e529-527f-97f1-0dcbb7ba1a8f
10+
GA
11+
+
12+
!!
13+
@dc04946a-59c2-508d-8f73-f5d2b460b2eb
14+
GAT
15+
+
16+
!!!
17+
@7308d02e-b0ba-5be8-9bcc-7bf8c91b5593
18+
A
19+
+
20+
!
21+
@3ecff3e5-b271-5cc3-ad06-a11158b4da57
22+
AAAA
23+
+
24+
!!!!
25+
@4a3ed9d0-0ddf-5842-9912-633ef6d3f640
26+
GA
27+
+
28+
!!
29+
@106871d5-c50a-5c28-9979-801d7caed7b1
30+
A
31+
+
32+
!
33+
@e63163dc-c8b3-5574-afb5-a3922bc64074
34+
G
35+
+
36+
!
37+
@2c8cc84d-40b6-5540-9a6a-089660505fd7
38+
GA
39+
+
40+
!!

0 commit comments

Comments
 (0)