Skip to content

Niek-Kamer/poseidon2-harness

Repository files navigation

poseidon2-harness

Dual-tree differential + microbench harness for Plonky3's Poseidon2 over Goldilocks on aarch64 NEON.

Imports p3-goldilocks from two Plonky3 checkouts side-by-side — a baseline tree (audited, read-only) and an h1 tree (experimental, under test) — runs the same permute through both, and asserts bitwise + canonical equality. Originally built to validate experimental edits to the NEON ASM atoms in aarch64_neon/utils.rs and aarch64_neon/poseidon2_asm.rs without rebuilding the surrounding crate graph each time.

Methodology

Four oracles, layered cheapest-to-strongest:

  1. Cheap properties (smoke.rs) — determinism, lane independence, lane symmetry, composition, distinct-input/distinct-output. Catches catastrophic collapses (e.g. lane aliasing) the moment cargo test runs.
  2. Per-layer oracle (layer_oracles.rs) — dual asm vs scalar asm vs scalar generic, each of the three layers in isolation. When the full-permute oracle fires, this says which layer broke without bisecting.
  3. Full-permute oracles (permute_oracle.rs, asm_vs_generic.rs) — bitwise across trees and three-way differential within a tree, W=8.
  4. Wider widths (wider_widths.rs) — cross-tree bitwise + h1 asm-vs-generic at W ∈ {12, 16, 20}.

Cycle/instruction measurement uses perf_event_open with grouped counters (cycles + instructions scheduled together) so the IPC ratio doesn't drift under PMU multiplexing. Two bench tools share that counter module:

  • src/bin/cycle_bench.rs — lightweight A/B microbench, JSON-line output suitable for CI regression checks.
  • benches/poseidon2_layers.rs — criterion target with Tukey filtering + confidence intervals for serious before/after runs.

Layout

src/
  lib.rs                 re-exports the two trees as `baseline` and `h1`
  cycle_counter.rs       perf_event_open wrapper (cycles + instructions, grouped)
  bin/cycle_bench.rs     lightweight A/B microbench, JSON output
benches/
  poseidon2_layers.rs    criterion benches in CPU cycles (3 fns × 2 trees)
tests/
  permute_oracle.rs      full-permute bitwise oracle across trees, W=8
  layer_oracles.rs       per-layer oracle (dual asm vs scalar asm vs generic)
  asm_vs_generic.rs      three-way differential per tree, W=8
  canonicality.rs        round-constant canonicality check
  smoke.rs               cheap properties: determinism, lane indep/symmetry, ...
  wider_widths.rs        cross-tree + asm-vs-generic at W ∈ {12, 16, 20}
  dual_tree_compiles.rs  sanity: both trees link into the same binary
  common/mod.rs          shared scaffolding (per-tree helper modules)
scripts/
  bench-pi.sh            Pi 5 bench wrapper; pins core, checks governor + paranoid

Setup

Both Plonky3 trees are pulled as git deps pinned by rev in Cargo.toml. No local checkouts required — cargo build resolves everything from upstream.

Default pins:

tree commit meaning
baseline b6380137… parent of PR #1619 — pre-edit reference
h1 af65376f… merge of PR #1623 — both edits landed

To A/B different commits, edit the rev = "..." values in Cargo.toml. Both trees can point at the same repo (default — upstream Plonky3) or at forks. The two revs only need to expose p3-goldilocks, p3-field, p3-symmetric, and p3-poseidon2 as workspace members.

Running

# Full oracle + property suite (aarch64 host required)
cargo test

# Criterion benches in CPU cycles (Linux + aarch64 + PMU access)
cargo bench --bench poseidon2_layers

# Lightweight A/B bench, one fn at a time
cargo run --release --bin cycle_bench -- --fn internal --tree both
cargo run --release --bin cycle_bench -- --fn external_initial --tree both --json

# Pi 5 wrapper: pins to core 3, checks governor + perf_event_paranoid
./scripts/bench-pi.sh quick      # cycle_bench, both trees, 3 fns
./scripts/bench-pi.sh criterion  # full criterion run
./scripts/bench-pi.sh both

PMU access requires kernel.perf_event_paranoid <= 1:

echo 1 | sudo tee /proc/sys/kernel/perf_event_paranoid

For stable cycle numbers, set the cpufreq governor to performance:

echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Limitations

  • aarch64 NEON only. The dual-w8 ASM the harness exercises is aarch64-specific. On x86 / macOS dev hosts the bench targets compile a stub main() that exits cleanly, so cargo bench doesn't break the workflow.
  • Linux for cycle measurement. PMU access goes through perf_event_open; the cycle_counter module is #[cfg(target_os = "linux")]. The harness still tests on other Linux/aarch64 hosts, just without cycle counts.
  • Path deps. The two Plonky3 trees are wired by relative path. The harness is intentionally not portable as-is — adjust Cargo.toml for your local layout.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

About

Diff two Plonky3 trees on Poseidon2 NEON correctness oracles + PMU cycle benches

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors