feat: asyncAggregateWithRandomness#353
Conversation
This PR reintroduces pippenger based `asyncAggregateWithRandomness`, heavily modelled after the [rust bindings](https://github.com/supranational/blst/blob/dece82ea537b422890888bacde4034ca5b5a44d8/bindings/rust/src/pippenger.rs ). We use a comptime `parallelMSM` fn which accepts a `Curve`. The underlying algorithm is the same, we just need to choose the usage (at comptime) between `G1`, which is the pubkey, or `G2`, the signature. We don't necessarily have to do this but I thought it looks cleaner rather than repeating the algo twice. Of course this means we have to do some ugly things like defining a `CurveDescriptor` struct and use a bunch of pointer casts to anyopaque functions. ## Implementation details Note that there are a few key differences: - instead of using `blst`'s internal threadpool, we use our own `ThreadPool` - Rust version uses pipelining to accumulate work. Key difference here is that the rows can come out of order because the top rows can start being summed while bottom rows are still computing; we use `submitAndWait` which assembles work sequentially from top to bottom. This has the benefit of allowing us to drop one of the outer loops that Rust does to handle out-of-order rows from workers. We have the benefit of simpler code here. ## Benchmarks Looks pretty comparable on M2 Mac: ``` bindings/perf/blst.test.ts asyncAggregateWithRandomness ✔ asyncAggregateWithRandomness lodestar-z 1 sets 5456.162 ops/s 183.2790 us/op - 713 runs 0.996 s ✔ asyncAggregateWithRandomness @chainsafe/blst 1 sets 4940.736 ops/s 202.3990 us/op - 524 runs 0.891 s ✔ asyncAggregateWithRandomness lodestar-z 8 sets 1274.951 ops/s 784.3440 us/op - 82 runs 1.82 s ✔ asyncAggregateWithRandomness @chainsafe/blst 8 sets 1295.293 ops/s 772.0260 us/op - 138 runs 2.02 s ✔ asyncAggregateWithRandomness lodestar-z 32 sets 408.2397 ops/s 2.449541 ms/op - 44 runs 2.16 s ✔ asyncAggregateWithRandomness @chainsafe/blst 32 sets 413.3244 ops/s 2.419407 ms/op - 31 runs 1.96 s ✔ asyncAggregateWithRandomness lodestar-z 64 sets 221.6369 ops/s 4.511884 ms/op - 9 runs 1.80 s ✔ asyncAggregateWithRandomness @chainsafe/blst 64 sets 218.6257 ops/s 4.574028 ms/op - 13 runs 1.90 s ✔ asyncAggregateWithRandomness lodestar-z 128 sets 111.7445 ops/s 8.948983 ms/op - 5 runs 1.88 s ✔ asyncAggregateWithRandomness @chainsafe/blst 128 sets 113.1346 ops/s 8.839029 ms/op - 7 runs 1.99 s ```
|
Warning Gemini encountered an error creating the summary. You can try again by commenting |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces asyncAggregateWithRandomness to the N-API bindings, enabling non-blocking multi-scalar multiplication (MSM) for public keys and signatures. The implementation includes a new parallel Pippenger algorithm in src/bls/pippenger.zig, ported from the blst Rust bindings, and updates the ThreadPool to support these parallel operations. Feedback focuses on security improvements for randomness generation and several style guide violations, including function length limits, the use of architecture-specific usize types, and line length restrictions.
| var seed_bytes: [8]u8 = undefined; | ||
| napi_io.get().random(&seed_bytes); | ||
| var prng = std.Random.DefaultPrng.init(std.mem.readInt(u64, &seed_bytes, .little)); | ||
| prng.random().bytes(data.randomness); |
| /// - Top-to-bottom Horner assembly: `out = top_row`, then for each row | ||
| /// below, double `wnd` times and add the row's tiles. Equivalent to | ||
| /// `sum_r row_r * 2^((ny-1-r)*wnd)`. | ||
| fn parallelMSM( |
There was a problem hiding this comment.
The parallelMSM function exceeds the hard limit of 70 lines per function (it is currently ~93 lines). Consider refactoring the Horner assembly logic or the work item initialization into separate helper functions to improve modularity and adhere to the style guide.
References
- Restrict the length of function bodies to reduce the probability of poorly structured code. We enforce a hard limit of 70 lines per function. (link)
| const CurveDescriptor = struct { | ||
| Projective: type, | ||
| Wrapper: type, | ||
| scratch_sizeof: *const fn (npoints: usize) callconv(.c) usize, | ||
| mult_pippenger: *const anyopaque, | ||
| tile_pippenger: *const anyopaque, | ||
| add_or_double: *const anyopaque, | ||
| double: *const anyopaque, | ||
| }; |
There was a problem hiding this comment.
The use of usize for struct fields and function signatures (e.g., in CurveDescriptor, TileDesc, and parallelMSM) violates the style guide's requirement to use explicitly-sized types like u32 and avoid architecture-specific usize.
References
- Use explicitly-sized types like u32 for everything, avoid architecture-specific usize. (link)
|
|
||
| /// Typed function-pointer signature for `blst_p?_add_or_double`. | ||
| fn AddOrDoubleFn(comptime Curve: CurveDescriptor) type { | ||
| return *const fn (*Curve.Projective, *const Curve.Projective, *const Curve.Projective) callconv(.c) void; |
There was a problem hiding this comment.
This line exceeds the 100-column limit. It should be wrapped to maintain readability and adhere to the typographic 'measure' limit.
return *const fn (
*Curve.Projective,
*const Curve.Projective,
*const Curve.Projective,
) callconv(.c) void;
References
- Hard limit all line lengths, without exception, to at most 100 columns for a good typographic 'measure'. (link)
This PR reintroduces pippenger based
asyncAggregateWithRandomness, heavily modelled after the rust bindings.We use a comptime
parallelMSMfn which accepts aCurve. The underlying algorithm is the same, we just need to choose the usage (at comptime) betweenG1, which is the pubkey, orG2, the signature.We don't necessarily have to do this but I thought it looks cleaner rather than repeating the algo twice. Of course this means we have to do some ugly things like defining a
CurveDescriptorstruct and use a bunch of pointer casts to anyopaque functions.Implementation details
Note that there are a few key differences:
blst's internal threadpool, we use our ownThreadPoolsubmitAndWaitwhich assembles work sequentially from top to bottom. This has the benefit of allowing us to drop one of the outer loops that Rust does to handle out-of-order rows from workers. We have the benefit of simpler code here.Benchmarks
Looks pretty comparable on M2 Mac:
EDIT: Something to note is that this was mostly claude; but i did audit the code against the existing rust bindings and it's more or less doing the same stuff, would appreciate a separate pair of eyes to see if i missed something though.