feat: multi-threaded bls operations by spiral-ladder · Pull Request #64 · ChainSafe/blst-z

spiral-ladder · 2026-03-11T04:56:20Z

Supports multi threaded versions for:

aggregateVerify, and
verifyMultipleAggregateSignatures.

This brings our impl up to parity with @chainsafe/blst

twoeths · 2026-03-11T08:27:47Z

+/// Verifies multiple aggregate signatures in parallel using the thread pool.
+///
+/// This is the multi-threaded version of the same function in `fast_verify.zig`.
+pub fn verifyMultipleAggregateSignatures(


race condition: 2 consumers can call this function at the same time, or even aggregateVerify() at the same time

should the pool has its own mutex, or somewhere?

Added a dispatch_mutex specifically for these operations at 8a2325f

twoeths · 2026-03-11T09:02:34Z

+    return acc.finalVerify(gtsig);
+}
+
+test "verifyMultipleAggregateSignatures multi-threaded" {


I'd love to have this test for single-threaded version as well so that we can see in CI that this multi-threaded version is way faster
also need negative tests too
looks like we needs more tests anyway so feel free to create an issue for it if you don't have time

twoeths

looks really great structurely
I dropped few minor comments

spiral-ladder · 2026-03-11T13:28:40Z

thanks for the review and the notes on safe multi-threaded access @twoeths, addressed those concerns. Yeah we do have a lack of testing in this library. Let's address that in a followup PR

wemeetagain · 2026-03-11T21:43:59Z

+/// Mutex for dispatching multi-threaded verification work.
+dispatch_mutex: std.Thread.Mutex = .{},
+
+var instance: ?*ThreadPool = null;


Don't think the singleton makes sense here. Given the explicit pool param in aggregate/verify fns below, imo is more standard to just have the consumer configure, allocate and manage the threadpool as necessary. (The comments in the "init"-ish section of get show a blending of application and library concerns.) (the 3/4 cpu count also doesn't match the rust behavior, but we don't need to, would rather a consumer-configured approach https://docs.rs/crate/threadpool/1.8.1/source/src/lib.rs#687)

The reason we're using singletons in lodestar-z is mostly for bindings, I don't think its generally a good idea for libraries -- or rather it just is extremely opinionated, and often for not a really good reason, considering zig idioms.

The alternative would be to hide the pool from consumers entirely (like the rust binding does). But that doesn't feel quite idiomatic.

As an aside, I'm not sure we're gaining much from maintaining this code separately from lodestar-z. We can half-heartedly maintain separation of concerns, larping a reusable library (but you can already see the cracks in the actual reusability) or just merge the code directly into lodestar-z and stop pretending we're building for anything other than ourselves. This will become more fraught as our application threading becomes more sophisticated and we want to thread more operations. I realize that this is just attempting a copy of the rust bindings, but I've always felt that the mt behavior there is overly opinionated and inflexible to reuse.

Appreciate the thoughtful writeup!

imo is more standard to just have the consumer configure, allocate and manage the threadpool as necessary.

Makes sense, i'll make changes towards this.

The alternative would be to hide the pool from consumers entirely (like the rust binding does)

I'm also in dilemma regarding this because 'ideally' you'd like the consumer to just do something like

// single-threaded const res = verifyMultipleAggregateSignatures(...) // or multi-threaded const res = verifyMultipleAggregateSignaturesMt(...)

and not something like

const pool = blst.pool.init({ .num_workers = 8 }); const res = pool.verifyMultipleAggregateSignatures(...)

but as you mentioned its a tradeoff between implicit ergonomics and explicit verbosity (why does the consumer need to care about init, etc.).

I think the actionable thing for now to do is to move some of the pool configuration logic downstream to the consumer.

I'm not sure we're gaining much from maintaining this code separately from lodestar-z

I think there's value in having a separate blst library since bls can be potentially used for use cases other than lodestar-z. With that said I see where you're coming from but i don't really think it's urgent to merge this into lodestar-z, and there's not much maintenance needed anyway for this lib. I think we can sit on this thought for a bit and circle back in future, especially once lodestar-z gets more mature and if maintaining this separately starts giving us issues (fingers crossed it doesn't happen).

So I did work towards this in fe78c68 but it still feels iffy. The line between application<>library is still pretty fuzzy and now i'm thinking we should just hide it entirely for now like the rust version - i don't have a good idea on how exactly to expose this elegantly atm to the consumer

Downstream lodestar-z usage in commit 402d6d5

Curious why fe78c68 feels iffy? (init should return Allocator.Error!*ThreadPool but otherwise looks fine)

It is more annoying/inconvenient than the implicit threadpool, but its also more truthful about what kind of resources are being used.

why does the consumer need to care about init

Good question. In this specific case, there's the question about the size of the threadpool. (The implicit model uses a default value) But generally, I think the zig idiom is explicit control over implicit control unless there's some overriding context.

I'm also in dilemma regarding this because 'ideally' you'd like the consumer to just do something like...

PS as an aside, I think it was quite close to your ideal shape before with the singleton, just a matter of removing the pool param from the verify* functions and using the pool singleton in the function body. I don't think its the right move given the language context, but just worth mentioning.

It feels iffy exactly because of this:

I think the zig idiom is explicit control over implicit control unless there's some overriding context.

I think in this case being overly explicitly can harm ergonomics, and we have the question of 'how far do we want to go with being explicit?' I suppose we don't want to expose a scheduler api and let the consumer decide how to batch the workloads for example.

But if we're just exposing the allocator + letting the consumer choose the number of cores that seems fair enough

wemeetagain · 2026-03-11T21:46:43Z

+
+    // Single-threaded fallback
+    if (n_elems <= 2 or pool.n_workers <= 1) {
+        var pairing = Pairing.init(&pool.pairing_bufs[0].data, true, dst);


this block can reuse Signature.aggregateVerify instead of re-implementing it.

This would require changing the signature of aggregateVerify since it was using slices of values instead of slices of pointers, opened up a followup PR for reviewability: #65

matthewkeil

I tend to agree on the mixing of application and library concerns. I'm not convinced that having a specific threadpool that ONLY does the two blst operations is the right idea. If there was a dedicated threadpool such that

const Treadpool = @This();

pub fn submitWork(workRequest: WorkRequest): void

where the threadpool was universal and independently configurable, possibly with config args that got passed to it from the lib it would be more "correct." The WorkRequest would have input and output parameters and any run specific options.

While the rust impl is quite opinionated, as @wemeetagain, said it still feels more flexible and idiomatic (separate lib/crate, runs universal work requests). I dont however like that the threadpool is almost entirely shielded from the consumer so they are locked into whatever setup the author preferred. The other side is that we are the sole consumer we are building for. And that should be the frame of reference that we build towards. If others want different features they can always submit a request or PR for the feature they want to see.

Which leads to the other big point that I think needs to be addressed. I feel like the focus for all of the zig work we do should be squarely on implementing the building blocks we need for lodestar-z. Along the way we need to ensure that they are as modular as possible so the Ethereum (and wider) community can use them as they see fit. I tend to think that all of our libs should live in a single mono-repo for ease of visibility, maintainability and AI contributor context.

I'm not sure of the specifics for zig module consumption but its my understanding that one of the monorepo modules can be isolated and consumed by a downstream consumer. If that is the case then I see no reason to have any of our zig work live outside of lodestar-z. If I am incorrect then lets revisit the overall architecture and organization of our repos for the zig roadmap to make sure we are consistent (ie move ssz-z out). This repo being separate is an artifact of the development timeline and how things began. We should try to consolidate now though, as we are going into production.

With all of the above in mind there is a balance that I think is necessary with regards to getting the state-transition on mainnet. I think moving the code, and all its git history into lodestar-z feels correct. Ideally this can happen after all of the PR's in this repo get merged and the state is stabilized to get us ready for production. That process should happen with a merge commit to preserve the history in lodestar-z.

In an effort to be expeditious we should aim to separate the threadpool concerns in a future PR so that this can get merged as-is with the performance in tact and ready for Lodestar integration. I say this strictly in an effort to delay the multithreaded rewrite until we have a coherent multi-threading plan for lodestar-z. I am confident that whatever we decide there will ultimately inform how the rewrite and consume the ThreadPool. I don't feel that its prudent immediately because we will really want to look at how we do the layering of multithread operations to avoid thread swarms (or whatever they are called) in the final application. For now this approach mimics (close enough) how the current operations work and that is fine.

cc @spiral-ladder @wemeetagain @twoeths @GrapeBaBa
I am open to your, and all other, opinions with regards to the above though. What do you think?

matthewkeil · 2026-03-12T10:48:31Z

As a second quick note. I do think that the threadpool in this PR needs to be configurable from JS via the bindings (ie sized at number of cores - 2 for the main and network threads) so that we can maintain roughly the same threading configuration that Lodestar runs with today

twoeths

looks good to me on this work, we have the ThreadPool implemented in this repo and napi-binding can configure n_workers as an option. This repo should not decide singleton or not, we let consumer do it.

regarding moving this repo to lodestar-z, I agree with it considering we already did it for ssz. For the context, my original idea to create this separate repo is to primarily work with Bun binding for lodestar + make it generic for consumers by supporting both variants (min_pk vs min_sig), which we don't have for now. The merge with lodestar-z should help development speed, later on if some consumers have a concern for lib separate, that's a nice issue to have (and at that time we'll be more mature too).

but that's a separate concern. I think we should merge this PR and do that migration work asynchronously.

GrapeBaBa · 2026-03-12T13:37:35Z

I tend to agree on the mixing of application and library concerns. I'm not convinced that having a specific threadpool that ONLY does the two blst operations is the right idea. If there was a dedicated threadpool such that
const Treadpool = @This();

pub fn submitWork(workRequest: WorkRequest): void
where the threadpool was universal and independently configurable, possibly with config args that got passed to it from the lib it would be more "correct." The WorkRequest would have input and output parameters and any run specific options.

While the rust impl is quite opinionated, as @wemeetagain, said it still feels more flexible and idiomatic (separate lib/crate, runs universal work requests). I dont however like that the threadpool is almost entirely shielded from the consumer so they are locked into whatever setup the author preferred. The other side is that we are the sole consumer we are building for. And that should be the frame of reference that we build towards. If others want different features they can always submit a request or PR for the feature they want to see.

Which leads to the other big point that I think needs to be addressed. I feel like the focus for all of the zig work we do should be squarely on implementing the building blocks we need for lodestar-z. Along the way we need to ensure that they are as modular as possible so the Ethereum (and wider) community can use them as they see fit. I tend to think that all of our libs should live in a single mono-repo for ease of visibility, maintainability and AI contributor context.

I'm not sure of the specifics for zig module consumption but its my understanding that one of the monorepo modules can be isolated and consumed by a downstream consumer. If that is the case then I see no reason to have any of our zig work live outside of lodestar-z. If I am incorrect then lets revisit the overall architecture and organization of our repos for the zig roadmap to make sure we are consistent (ie move ssz-z out). This repo being separate is an artifact of the development timeline and how things began. We should try to consolidate now though, as we are going into production.

With all of the above in mind there is a balance that I think is necessary with regards to getting the state-transition on mainnet. I think moving the code, and all its git history into lodestar-z feels correct. Ideally this can happen after all of the PR's in this repo get merged and the state is stabilized to get us ready for production. That process should happen with a merge commit to preserve the history in lodestar-z.

In an effort to be expeditious we should aim to separate the threadpool concerns in a future PR so that this can get merged as-is with the performance in tact and ready for Lodestar integration. I say this strictly in an effort to delay the multithreaded rewrite until we have a coherent multi-threading plan for lodestar-z. I am confident that whatever we decide there will ultimately inform how the rewrite and consume the ThreadPool. I don't feel that its prudent immediately because we will really want to look at how we do the layering of multithread operations to avoid thread swarms (or whatever they are called) in the final application. For now this approach mimics (close enough) how the current operations work and that is fine.

cc @spiral-ladder @wemeetagain @twoeths @GrapeBaBa I am open to your, and all other, opinions with regards to the above though. What do you think?

The latest code change seems well, exposing the Threadpool init is better. I also agree to use the mono repo for all zig works.

wemeetagain · 2026-03-12T13:46:08Z

+
+/// Creates a thread pool with the specified number of workers.
+/// The caller owns the returned pool and must call `deinit` when done.
+pub fn init(allocator: std.mem.Allocator, opts: Opts) *ThreadPool {


should return Allocator.Error!*ThreadPool

3416448 fixed here, included spawn error too

spiral-ladder added 2 commits March 11, 2026 12:38

feat: ThreadPool for verifyMultipleAggregateSignatures

1f211db

aggregateVerify

9c4c64c

spiral-ladder self-assigned this Mar 11, 2026

spiral-ladder mentioned this pull request Mar 11, 2026

feat(blst): native multi-threaded blst ChainSafe/lodestar-z#235

Closed