Skip to content

feat: asyncAggregateWithRandomness#353

Open
spiral-ladder wants to merge 1 commit intomainfrom
bing/async-agg-w-randomness
Open

feat: asyncAggregateWithRandomness#353
spiral-ladder wants to merge 1 commit intomainfrom
bing/async-agg-w-randomness

Conversation

@spiral-ladder
Copy link
Copy Markdown
Collaborator

@spiral-ladder spiral-ladder commented May 7, 2026

This PR reintroduces pippenger based asyncAggregateWithRandomness, heavily modelled after the rust bindings.

We use a comptime parallelMSM fn which accepts a Curve. The underlying algorithm is the same, we just need to choose the usage (at comptime) between G1, which is the pubkey, or G2, the signature.

We don't necessarily have to do this but I thought it looks cleaner rather than repeating the algo twice. Of course this means we have to do some ugly things like defining a CurveDescriptor struct and use a bunch of pointer casts to anyopaque functions.

Implementation details

Note that there are a few key differences:

  • instead of using blst's internal threadpool, we use our own ThreadPool
  • Rust version uses pipelining to accumulate work. Key difference here is that the rows can come out of order because the top rows can start being summed while bottom rows are still computing; we use submitAndWait which assembles work sequentially from top to bottom. This has the benefit of allowing us to drop one of the outer loops that Rust does to handle out-of-order rows from workers. We have the benefit of simpler code here.

Benchmarks

Looks pretty comparable on M2 Mac:

bindings/perf/blst.test.ts
  asyncAggregateWithRandomness
    ✔ asyncAggregateWithRandomness lodestar-z  1 sets                     5456.162 ops/s    183.2790 us/op        -        713 runs  0.996 s
    ✔ asyncAggregateWithRandomness @chainsafe/blst  1 sets                4940.736 ops/s    202.3990 us/op        -        524 runs  0.891 s
    ✔ asyncAggregateWithRandomness lodestar-z  8 sets                     1274.951 ops/s    784.3440 us/op        -         82 runs   1.82 s
    ✔ asyncAggregateWithRandomness @chainsafe/blst  8 sets                1295.293 ops/s    772.0260 us/op        -        138 runs   2.02 s
    ✔ asyncAggregateWithRandomness lodestar-z  32 sets                    408.2397 ops/s    2.449541 ms/op        -         44 runs   2.16 s
    ✔ asyncAggregateWithRandomness @chainsafe/blst  32 sets               413.3244 ops/s    2.419407 ms/op        -         31 runs   1.96 s
    ✔ asyncAggregateWithRandomness lodestar-z  64 sets                    221.6369 ops/s    4.511884 ms/op        -          9 runs   1.80 s
    ✔ asyncAggregateWithRandomness @chainsafe/blst  64 sets               218.6257 ops/s    4.574028 ms/op        -         13 runs   1.90 s
    ✔ asyncAggregateWithRandomness lodestar-z  128 sets                   111.7445 ops/s    8.948983 ms/op        -          5 runs   1.88 s
    ✔ asyncAggregateWithRandomness @chainsafe/blst  128 sets              113.1346 ops/s    8.839029 ms/op        -          7 runs   1.99 s

EDIT: Something to note is that this was mostly claude; but i did audit the code against the existing rust bindings and it's more or less doing the same stuff, would appreciate a separate pair of eyes to see if i missed something though.

This PR reintroduces pippenger based `asyncAggregateWithRandomness`,
heavily modelled after the [rust bindings](https://github.com/supranational/blst/blob/dece82ea537b422890888bacde4034ca5b5a44d8/bindings/rust/src/pippenger.rs
).

We use a comptime `parallelMSM` fn which accepts a `Curve`. The
underlying algorithm is the same, we just need to choose the usage (at
comptime) between `G1`, which is the pubkey, or `G2`, the signature.

We don't necessarily have to do this but I thought it looks cleaner
rather than repeating the algo twice. Of course this means we have to do
some ugly things like defining a `CurveDescriptor` struct and use a
bunch of pointer casts to anyopaque functions.

## Implementation details

Note that there are a few key differences:

- instead of using `blst`'s internal threadpool, we use our own
  `ThreadPool`
- Rust version uses pipelining to accumulate work. Key difference here is
  that the rows can come out of order because the top rows can start
  being summed while bottom rows are still computing; we use
  `submitAndWait` which assembles work sequentially from top to bottom.
  This has the benefit of allowing us to drop one of the outer loops
  that Rust does to handle out-of-order rows from workers. We have the benefit of
  simpler code here.

## Benchmarks

Looks pretty comparable on M2 Mac:

```
bindings/perf/blst.test.ts
  asyncAggregateWithRandomness
    ✔ asyncAggregateWithRandomness lodestar-z  1 sets                     5456.162 ops/s    183.2790 us/op        -        713 runs  0.996 s
    ✔ asyncAggregateWithRandomness @chainsafe/blst  1 sets                4940.736 ops/s    202.3990 us/op        -        524 runs  0.891 s
    ✔ asyncAggregateWithRandomness lodestar-z  8 sets                     1274.951 ops/s    784.3440 us/op        -         82 runs   1.82 s
    ✔ asyncAggregateWithRandomness @chainsafe/blst  8 sets                1295.293 ops/s    772.0260 us/op        -        138 runs   2.02 s
    ✔ asyncAggregateWithRandomness lodestar-z  32 sets                    408.2397 ops/s    2.449541 ms/op        -         44 runs   2.16 s
    ✔ asyncAggregateWithRandomness @chainsafe/blst  32 sets               413.3244 ops/s    2.419407 ms/op        -         31 runs   1.96 s
    ✔ asyncAggregateWithRandomness lodestar-z  64 sets                    221.6369 ops/s    4.511884 ms/op        -          9 runs   1.80 s
    ✔ asyncAggregateWithRandomness @chainsafe/blst  64 sets               218.6257 ops/s    4.574028 ms/op        -         13 runs   1.90 s
    ✔ asyncAggregateWithRandomness lodestar-z  128 sets                   111.7445 ops/s    8.948983 ms/op        -          5 runs   1.88 s
    ✔ asyncAggregateWithRandomness @chainsafe/blst  128 sets              113.1346 ops/s    8.839029 ms/op        -          7 runs   1.99 s

```
@spiral-ladder spiral-ladder self-assigned this May 7, 2026
@spiral-ladder spiral-ladder requested a review from a team as a code owner May 7, 2026 16:52
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the summary. You can try again by commenting /gemini summary.

@spiral-ladder
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces asyncAggregateWithRandomness to the N-API bindings, enabling non-blocking multi-scalar multiplication (MSM) for public keys and signatures. The implementation includes a new parallel Pippenger algorithm in src/bls/pippenger.zig, ported from the blst Rust bindings, and updates the ThreadPool to support these parallel operations. Feedback focuses on security improvements for randomness generation and several style guide violations, including function length limits, the use of architecture-specific usize types, and line length restrictions.

Comment thread bindings/napi/blst.zig
Comment on lines +1109 to +1112
var seed_bytes: [8]u8 = undefined;
napi_io.get().random(&seed_bytes);
var prng = std.Random.DefaultPrng.init(std.mem.readInt(u64, &seed_bytes, .little));
prng.random().bytes(data.randomness);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Instead of seeding a PRNG with 64 bits of entropy to generate the randomness buffer, it is more secure and simpler to fill the entire buffer directly with system entropy using napi_io.get().random().

    napi_io.get().random(data.randomness);

Comment thread src/bls/pippenger.zig
/// - Top-to-bottom Horner assembly: `out = top_row`, then for each row
/// below, double `wnd` times and add the row's tiles. Equivalent to
/// `sum_r row_r * 2^((ny-1-r)*wnd)`.
fn parallelMSM(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parallelMSM function exceeds the hard limit of 70 lines per function (it is currently ~93 lines). Consider refactoring the Horner assembly logic or the work item initialization into separate helper functions to improve modularity and adhere to the style guide.

References
  1. Restrict the length of function bodies to reduce the probability of poorly structured code. We enforce a hard limit of 70 lines per function. (link)

Comment thread src/bls/pippenger.zig
Comment on lines +45 to +53
const CurveDescriptor = struct {
Projective: type,
Wrapper: type,
scratch_sizeof: *const fn (npoints: usize) callconv(.c) usize,
mult_pippenger: *const anyopaque,
tile_pippenger: *const anyopaque,
add_or_double: *const anyopaque,
double: *const anyopaque,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of usize for struct fields and function signatures (e.g., in CurveDescriptor, TileDesc, and parallelMSM) violates the style guide's requirement to use explicitly-sized types like u32 and avoid architecture-specific usize.

References
  1. Use explicitly-sized types like u32 for everything, avoid architecture-specific usize. (link)

Comment thread src/bls/pippenger.zig

/// Typed function-pointer signature for `blst_p?_add_or_double`.
fn AddOrDoubleFn(comptime Curve: CurveDescriptor) type {
return *const fn (*Curve.Projective, *const Curve.Projective, *const Curve.Projective) callconv(.c) void;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line exceeds the 100-column limit. It should be wrapped to maintain readability and adhere to the typographic 'measure' limit.

    return *const fn (
        *Curve.Projective,
        *const Curve.Projective,
        *const Curve.Projective,
    ) callconv(.c) void;
References
  1. Hard limit all line lengths, without exception, to at most 100 columns for a good typographic 'measure'. (link)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant