Skip to content

feat: mt verify with benchmarks#231

Closed
spiral-ladder wants to merge 18 commits intomainfrom
bing/mt-verify
Closed

feat: mt verify with benchmarks#231
spiral-ladder wants to merge 18 commits intomainfrom
bing/mt-verify

Conversation

@spiral-ladder
Copy link
Copy Markdown
Collaborator

@spiral-ladder spiral-ladder commented Mar 6, 2026

pre-requisite for ChainSafe/lodestar#8900

Integrated the changes from cayman/blst-mt here, and numbers are a lot more reasonable for verifyMultipleAggregateSignatures.

Turns out the pooling done on the binding layer instead of natively in zig was the difference. The previous attempts were targeted at doing multi-threaded verification on the native layer (see ChainSafe/blst-z#60), but that turned out to be pretty bad - see upstream PR). This PR instead uses blst-z as is without any changes and does pooling/multi-threading on the napi layer instead.

I ran this a few times, mostly got consistently ~= @chainsafe/blst results.

To run:

pnpm i && pnpm benchmark

On an M2 Pro:

bindings/perf/blst.test.ts
  aggregatePublicKeys
    ✔ aggregatePublicKeys lodestar-z  1 keys                              748503.0 ops/s    1.336000 us/op        -       6203 runs   3.00 s
    ✔ aggregatePublicKeys @chainsafe/blst  1 keys                         703729.8 ops/s    1.421000 us/op        -       6193 runs   3.00 s
    ✔ aggregatePublicKeys lodestar-z  8 keys                              121951.2 ops/s    8.200000 us/op        -        462 runs   3.00 s
    ✔ aggregatePublicKeys @chainsafe/blst  8 keys                         127713.9 ops/s    7.830000 us/op        -        469 runs   3.00 s
    ✔ aggregatePublicKeys lodestar-z  32 keys                             39339.10 ops/s    25.42000 us/op        -        111 runs   3.01 s
    ✔ aggregatePublicKeys @chainsafe/blst  32 keys                        38330.33 ops/s    26.08900 us/op        -        109 runs   3.01 s
    ✔ aggregatePublicKeys lodestar-z  128 keys                            13414.72 ops/s    74.54500 us/op        -         30 runs   3.04 s
    ✔ aggregatePublicKeys @chainsafe/blst  128 keys                       11803.59 ops/s    84.72000 us/op        -         23 runs   2.78 s
    ✔ aggregatePublicKeys lodestar-z  256 keys                            6699.898 ops/s    149.2560 us/op        -         13 runs   3.00 s
    ✔ aggregatePublicKeys @chainsafe/blst  256 keys                       6145.262 ops/s    162.7270 us/op        -         13 runs   3.08 s

  aggregateSignatures
    ✔ aggregateSignatures lodestar-z  1 sigs                              780031.2 ops/s    1.282000 us/op        -       5250 runs   2.55 s
    ✔ aggregateSignatures @chainsafe/blst  1 sigs                         836820.1 ops/s    1.195000 us/op        -       4337 runs   2.13 s
    ✔ aggregateSignatures lodestar-z  8 sigs                              75244.54 ops/s    13.29000 us/op        -        165 runs   2.05 s
    ✔ aggregateSignatures @chainsafe/blst  8 sigs                         53769.22 ops/s    18.59800 us/op        -        467 runs   3.01 s
    ✔ aggregateSignatures lodestar-z  32 sigs                             23397.83 ops/s    42.73900 us/op        -        117 runs   3.00 s
    ✔ aggregateSignatures @chainsafe/blst  32 sigs                        23404.95 ops/s    42.72600 us/op        -         53 runs   2.21 s
    ✔ aggregateSignatures lodestar-z  128 sigs                            6479.160 ops/s    154.3410 us/op        -         20 runs   2.57 s
    ✔ aggregateSignatures @chainsafe/blst  128 sigs                       6047.741 ops/s    165.3510 us/op        -         28 runs   3.02 s
    ✔ aggregateSignatures lodestar-z  256 sigs                            3220.695 ops/s    310.4920 us/op        -         11 runs   2.83 s
    ✔ aggregateSignatures @chainsafe/blst  256 sigs                       3083.993 ops/s    324.2550 us/op        -          8 runs   2.44 s

  aggregateVerify
    ✔ aggregateVerify lodestar-z  3 sets                                  1218.814 ops/s    820.4700 us/op        -        103 runs   1.40 s
    ✔ aggregateVerify @chainsafe/blst  3 sets                             1435.247 ops/s    696.7440 us/op        -        862 runs   3.00 s
    ✔ aggregateVerify lodestar-z  8 sets                                  1147.692 ops/s    871.3140 us/op        -        232 runs   2.44 s
    ✔ aggregateVerify @chainsafe/blst  8 sets                             1216.712 ops/s    821.8870 us/op        -         80 runs   1.82 s
    ✔ aggregateVerify lodestar-z  32 sets                                 571.2556 ops/s    1.750530 ms/op        -        100 runs   2.97 s
    ✔ aggregateVerify @chainsafe/blst  32 sets                            560.1074 ops/s    1.785372 ms/op        -        103 runs   3.01 s
    ✔ aggregateVerify lodestar-z  64 sets                                 360.0484 ops/s    2.777404 ms/op        -         28 runs   2.35 s
    ✔ aggregateVerify @chainsafe/blst  64 sets                            296.5946 ops/s    3.371606 ms/op        -         49 runs   3.00 s
    ✔ aggregateVerify lodestar-z  128 sets                                119.1942 ops/s    8.389672 ms/op        -         22 runs   3.05 s
    ✔ aggregateVerify @chainsafe/blst  128 sets                           171.3569 ops/s    5.835774 ms/op        -         24 runs   3.00 s

  verifyMultipleAggregateSignatures
    ✔ lodestar-z  3 sets                                                  1049.442 ops/s    952.8870 us/op        -        334 runs   1.83 s
    ✔ @chainsafe/blst  3 sets                                             1046.280 ops/s    955.7670 us/op        -        241 runs   1.62 s
    ✔ lodestar-z  8 sets                                                  970.8540 ops/s    1.030021 ms/op        -        125 runs   2.03 s
    ✔ @chainsafe/blst  8 sets                                             884.7420 ops/s    1.130273 ms/op        -        353 runs   3.00 s
    ✔ lodestar-z  32 sets                                                 443.7648 ops/s    2.253446 ms/op        -         43 runs   2.15 s
    ✔ @chainsafe/blst  32 sets                                            449.9561 ops/s    2.222439 ms/op        -         29 runs   1.94 s
    ✔ lodestar-z  64 sets                                                 262.7697 ops/s    3.805614 ms/op        -         40 runs   2.72 s
    ✔ @chainsafe/blst  64 sets                                            271.0820 ops/s    3.688921 ms/op        -         33 runs   2.54 s
    ✔ lodestar-z  128 sets                                                124.6683 ops/s    8.021286 ms/op        -         24 runs   3.00 s
    ✔ @chainsafe/blst  128 sets                                           133.3907 ops/s    7.496774 ms/op        -         26 runs   3.07 s

With @wemeetagain 's branch, it was about similar:

bindings/perf/functions.test.ts
  sanity
    ✔ sanitytest - blstZ                                                  792393.0 ops/s    1.262000 us/op        -    1454903 runs   2.38 s
    ✔ sanitytest - blstTS                                                 630119.7 ops/s    1.587000 us/op        -    1040106 runs   2.02 s

  aggregatePublicKeys
    ✔ aggregatePublicKeys blstZ - 1 keys                                   1221001 ops/s    819.0000 ns/op        -    1194944 runs   1.76 s
    ✔ aggregatePublicKeys blstTS - 1 keys                                 815660.7 ops/s    1.226000 us/op        -     249565 runs   3.65 s
    ✔ aggregatePublicKeys blstZ - 8 keys                                  141043.7 ops/s    7.090000 us/op        -     114893 runs  0.909 s
    ✔ aggregatePublicKeys blstTS - 8 keys                                 139353.4 ops/s    7.176000 us/op        -       6169 runs  0.826 s
    ✔ aggregatePublicKeys blstZ - 32 keys                                 53850.30 ops/s    18.57000 us/op        -      24708 runs  0.505 s
    ✔ aggregatePublicKeys blstTS - 32 keys                                53413.10 ops/s    18.72200 us/op        -       1167 runs  0.992 s
    ✔ aggregatePublicKeys blstZ - 128 keys                                14944.78 ops/s    66.91300 us/op        -      13336 runs   1.17 s
    ✔ aggregatePublicKeys blstTS - 128 keys                               14663.40 ops/s    68.19700 us/op        -        930 runs   3.36 s
    ✔ aggregatePublicKeys blstZ - 256 keys                                7594.283 ops/s    131.6780 us/op        -       6564 runs   1.15 s
    ✔ aggregatePublicKeys blstTS - 256 keys                               7812.927 ops/s    127.9930 us/op        -        545 runs   5.28 s

  aggregateSignatures
    ✔ aggregateSignatures blstZ - 1 sigs                                   1377410 ops/s    726.0000 ns/op        -    1807262 runs   2.73 s
    ✔ aggregateSignatures blstTS - 1 sigs                                 838926.2 ops/s    1.192000 us/op        -      89767 runs   2.42 s
    ✔ aggregateSignatures blstZ - 8 sigs                                  89055.13 ops/s    11.22900 us/op        -      32404 runs  0.404 s
    ✔ aggregateSignatures blstTS - 8 sigs                                 86933.84 ops/s    11.50300 us/op        -       1456 runs  0.512 s
    ✔ aggregateSignatures blstZ - 32 sigs                                 26053.20 ops/s    38.38300 us/op        -       6585 runs  0.303 s
    ✔ aggregateSignatures blstTS - 32 sigs                                25441.41 ops/s    39.30600 us/op        -       1307 runs   1.95 s
    ✔ aggregateSignatures blstZ - 128 sigs                                6847.205 ops/s    146.0450 us/op        -       2050 runs  0.483 s
    ✔ aggregateSignatures blstTS - 128 sigs                               7023.606 ops/s    142.3770 us/op        -        188 runs   3.98 s
    ✔ aggregateSignatures blstZ - 256 sigs                                3373.159 ops/s    296.4580 us/op        -       2021 runs  0.903 s
    ✔ aggregateSignatures blstTS - 256 sigs                               3443.799 ops/s    290.3770 us/op        -         92 runs   7.47 s

  aggregateVerify
    ✔ aggregateVerify blstZ - 1 sets                                      1268.525 ops/s    788.3170 us/op        -        257 runs  0.705 s
    ✔ aggregateVerify blstTS - 1 sets                                     1598.210 ops/s    625.7000 us/op        -        909 runs   1.14 s
    ✔ aggregateVerify blstZ - 8 sets                                      1030.282 ops/s    970.6080 us/op        -       1228 runs   1.72 s
    ✔ aggregateVerify blstTS - 8 sets                                     1210.268 ops/s    826.2630 us/op        -        438 runs   1.21 s
    ✔ aggregateVerify blstZ - 32 sets                                     550.8027 ops/s    1.815532 ms/op        -        602 runs   1.63 s
    ✔ aggregateVerify blstTS - 32 sets                                    634.7373 ops/s    1.575455 ms/op        -        141 runs   1.31 s
    ✔ aggregateVerify blstZ - 128 sets                                    176.6969 ops/s    5.659408 ms/op        -        319 runs   2.38 s
    ✔ aggregateVerify blstTS - 128 sets                                   195.7491 ops/s    5.108580 ms/op        -         82 runs   1.92 s
    ✔ aggregateVerify blstZ - 256 sets                                    109.5515 ops/s    9.128127 ms/op        -         99 runs   1.46 s
    ✔ aggregateVerify blstTS - 256 sets                                   106.8037 ops/s    9.362972 ms/op        -        124 runs   3.53 s

  verifyMultipleAggregateSignatures
    ✔ verifyMultiAggSig blstZ - 1 sets                                    1115.641 ops/s    896.3460 us/op        -        451 runs  0.906 s
    ✔ verifyMultiAggSig blstTS - 1 sets                                   1096.657 ops/s    911.8620 us/op        -        319 runs  0.826 s
    ✔ verifyMultiAggSig blstZ - 8 sets                                    996.2313 ops/s    1.003783 ms/op        -        403 runs  0.906 s
    ✔ verifyMultiAggSig blstTS - 8 sets                                   993.5420 ops/s    1.006500 ms/op        -        304 runs   1.06 s
    ✔ verifyMultiAggSig blstZ - 32 sets                                   467.8973 ops/s    2.137221 ms/op        -        143 runs  0.808 s
    ✔ verifyMultiAggSig blstTS - 32 sets                                  464.9837 ops/s    2.150613 ms/op        -        120 runs   1.21 s
    ✔ verifyMultiAggSig blstZ - 128 sets                                  153.7131 ops/s    6.505627 ms/op        -         49 runs  0.822 s
    ✔ verifyMultiAggSig blstTS - 128 sets                                 156.4704 ops/s    6.390986 ms/op        -         28 runs   1.22 s
    ✔ verifyMultiAggSig blstZ - 256 sets                                  86.04139 ops/s    11.62231 ms/op        -         37 runs  0.943 s
    ✔ verifyMultiAggSig blstTS - 256 sets                                 69.99464 ops/s    14.28681 ms/op        -         49 runs   2.13 s

@spiral-ladder spiral-ladder self-assigned this Mar 6, 2026
@spiral-ladder spiral-ladder requested a review from a team as a code owner March 6, 2026 02:00
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance of BLS signature verification by introducing multithreaded execution capabilities. It achieves this by updating the core blst-z library dependency and adapting the verification function calls to leverage the new thread pool management. Furthermore, the changes refine memory handling for verification buffers and optimize the build configuration for faster compilation, contributing to a more efficient and robust cryptographic binding.

Highlights

  • Multithreaded BLS Verification: Enabled multithreaded BLS verification by integrating blst-z thread pool initialization and deinitialization, allowing for parallel processing of verification tasks.
  • Dependency Update: Updated the blst-z dependency to a version that supports multithreaded verification and includes refined API changes.
  • Build Optimization: Optimized the build process by switching the prepare script from ReleaseSafe to ReleaseFast for improved compilation speed.
  • Memory Safety and Correctness: Improved memory handling by using @alignCast for pairing_buf and *const pointers for public keys and signatures in verification functions, enhancing safety and correctness.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • bindings/napi/blst.zig
    • Applied @alignCast to pairing_buf in aggregateVerify and fastAggregateVerify calls for proper memory alignment.
    • Updated pks and sigs allocations to use *const PublicKey and *const Signature respectively, indicating immutability.
    • Refactored the blst.verifyMultipleAggregateSignatures call by removing pairing_buf and n_elems parameters and adding an allocator parameter.
  • bindings/napi/root.zig
    • Imported the blst library (aliased as blst_z) to access its functionalities.
    • Added calls to blst_z.thread_pool.initializeThreadPool during environment registration to set up multithreading.
    • Added calls to blst_z.thread_pool.deinitializeThreadPool during environment cleanup to properly tear down the thread pool.
  • build.zig.zon
    • Updated the blst dependency URL and hash to point to a newer version of the blst-z library.
  • package.json
    • Changed the prepare script's optimization flag from ReleaseSafe to ReleaseFast for improved build performance.
  • src/state_transition/utils/bls.zig
    • Applied @alignCast to pairing_buf in the fastAggregateVerify call to ensure correct memory alignment.
  • zbuild.zon
    • Updated the blst dependency URL to reference the bing/mt-verify branch for specific multithreaded verification features.
Activity
  • No human activity (comments, reviews, etc.) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces multi-threaded verification for verifyMultipleAggregateSignatures by updating the blst-z dependency and leveraging its new thread pool capabilities, including initializing and de-initializing the thread pool, adapting to the updated blst-z API, and using *const pointers. However, it also introduces a critical security vulnerability due to several instances of unsafe alignment casts using @alignCast on buffers that are not explicitly aligned, which can lead to Undefined Behavior, crashes, or instability. These buffers must be properly aligned at their declaration point to mitigate this risk. Additionally, there are concerns regarding build configuration and dependency management that could impact safety and reproducibility.

Comment thread package.json
},
"scripts": {
"prepare": "zig build build-lib:bindings -Doptimize=ReleaseSafe",
"prepare": "zig build build-lib:bindings -Doptimize=ReleaseFast",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Changing the optimization level to ReleaseFast disables all runtime safety checks in Zig, which can lead to undefined behavior (e.g., from out-of-bounds memory access) going undetected. The style guide (line 5) explicitly prioritizes safety over performance. For a cryptographic library, sacrificing these safety checks for performance is a critical risk. Please revert this to ReleaseSafe to maintain the project's safety guarantees. If ReleaseFast is deemed essential for performance, this trade-off should be explicitly justified and ideally confined to benchmark-specific build configurations, not the default prepare script.

Suggested change
"prepare": "zig build build-lib:bindings -Doptimize=ReleaseFast",
"prepare": "zig build build-lib:bindings -Doptimize=ReleaseSafe",
References
  1. The style guide prioritizes safety over performance: "Our design goals are safety, performance, and developer experience. In that order." The change to ReleaseFast contradicts this by disabling runtime safety checks. (link)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying blst library uses -O2 for the cryptographic operations, but the blst-ts library uses -O3. So we should do the same

Comment thread bindings/napi/blst.zig Outdated
const result = sig.aggregateVerify(
sig_groupcheck,
&pairing_buf,
@alignCast(&pairing_buf),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The use of @alignCast on &pairing_buf is unsafe because pairing_buf is declared as a byte array [Pairing.sizeOf()]u8 (line 549), which has a default alignment of 1. If the aggregateVerify function expects a pointer with a higher alignment (e.g., aligned to the Pairing struct), this results in Undefined Behavior if the buffer is not correctly aligned at runtime. This can lead to crashes on architectures with strict alignment requirements or when using SIMD instructions. To fix this, the declaration of pairing_buf should be updated to include an explicit alignment: var pairing_buf: [Pairing.sizeOf()]u8 align(@alignOf(Pairing)) = undefined;

Comment thread bindings/napi/blst.zig
var pairing_buf: [Pairing.sizeOf()]u8 = undefined;
// `pks_validate` is always false here since we assume proof of possession for public keys.
const result = sig.fastAggregateVerify(sigs_groupcheck, &pairing_buf, msg_info.data[0..32], DST, pks, false) catch {
const result = sig.fastAggregateVerify(sigs_groupcheck, @alignCast(&pairing_buf), msg_info.data[0..32], DST, pks, false) catch {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The use of @alignCast on &pairing_buf is unsafe because pairing_buf is declared as a byte array [Pairing.sizeOf()]u8 (line 599), which has a default alignment of 1. If the fastAggregateVerify function expects a pointer with a higher alignment, this results in Undefined Behavior. To fix this, the declaration of pairing_buf should be updated to include an explicit alignment: var pairing_buf: [Pairing.sizeOf()]u8 align(@alignOf(Pairing)) = undefined;

Comment thread src/state_transition/utils/bls.zig
Comment thread zbuild.zon Outdated
@spiral-ladder
Copy link
Copy Markdown
Collaborator Author

On cayman/blst-mt, it seems to outperform this in larger sets. Probably because the pooling is done on the binding level instead of the native layer:

  aggregatePublicKeys
    ✔ aggregatePublicKeys blstZ - 1 keys                                   1377410 ops/s    726.0000 ns/op        -    3605990 runs   4.71 s
    ✔ aggregatePublicKeys blstTS - 1 keys                                  1317523 ops/s    759.0000 ns/op        -      72081 runs   1.01 s
    ✔ aggregatePublicKeys blstZ - 8 keys                                  151446.3 ops/s    6.603000 us/op        -      39742 runs  0.303 s
    ✔ aggregatePublicKeys blstTS - 8 keys                                 149790.3 ops/s    6.676000 us/op        -       4758 runs  0.612 s
    ✔ aggregatePublicKeys blstZ - 32 keys                                 54827.57 ops/s    18.23900 us/op        -      18927 runs  0.404 s
    ✔ aggregatePublicKeys blstTS - 32 keys                                54504.82 ops/s    18.34700 us/op        -       1201 runs  0.922 s
    ✔ aggregatePublicKeys blstZ - 128 keys                                15099.81 ops/s    66.22600 us/op        -      11607 runs  0.919 s
    ✔ aggregatePublicKeys blstTS - 128 keys                               15737.87 ops/s    63.54100 us/op        -        245 runs   2.12 s
    ✔ aggregatePublicKeys blstZ - 256 keys                                7729.050 ops/s    129.3820 us/op        -       2282 runs  0.529 s
    ✔ aggregatePublicKeys blstTS - 256 keys                               8075.457 ops/s    123.8320 us/op        -        151 runs   3.99 s

  aggregateSignatures
    ✔ aggregateSignatures blstZ - 1 sigs                                   1364256 ops/s    733.0000 ns/op        -    1633955 runs   2.36 s
    ✔ aggregateSignatures blstTS - 1 sigs                                  1231527 ops/s    812.0000 ns/op        -      22721 runs  0.606 s
    ✔ aggregateSignatures blstZ - 8 sigs                                  85005.10 ops/s    11.76400 us/op        -      82142 runs   1.01 s
    ✔ aggregateSignatures blstTS - 8 sigs                                 86281.28 ops/s    11.59000 us/op        -       2401 runs  0.719 s
    ✔ aggregateSignatures blstZ - 32 sigs                                 25887.29 ops/s    38.62900 us/op        -       6651 runs  0.317 s
    ✔ aggregateSignatures blstTS - 32 sigs                                26031.50 ops/s    38.41500 us/op        -        605 runs   1.38 s
    ✔ aggregateSignatures blstZ - 128 sigs                                6737.183 ops/s    148.4300 us/op        -       2017 runs  0.454 s
    ✔ aggregateSignatures blstTS - 128 sigs                               6868.132 ops/s    145.6000 us/op        -        186 runs   3.95 s
    ✔ aggregateSignatures blstZ - 256 sigs                                3370.965 ops/s    296.6510 us/op        -       1008 runs  0.649 s
    ✔ aggregateSignatures blstTS - 256 sigs                               3485.681 ops/s    286.8880 us/op        -         80 runs   7.17 s

  aggregateVerify
    ✔ aggregateVerify blstZ - 1 sets                                      1231.835 ops/s    811.7970 us/op        -        373 runs  0.805 s
    ✔ aggregateVerify blstTS - 1 sets                                     1490.682 ops/s    670.8340 us/op        -        285 runs  0.734 s
    ✔ aggregateVerify blstZ - 8 sets                                      1121.062 ops/s    892.0110 us/op        -        891 runs   1.32 s
    ✔ aggregateVerify blstTS - 8 sets                                     1133.146 ops/s    882.4990 us/op        -        419 runs   1.20 s
    ✔ aggregateVerify blstZ - 32 sets                                     562.1733 ops/s    1.778811 ms/op        -        279 runs   1.02 s
    ✔ aggregateVerify blstTS - 32 sets                                    615.0663 ops/s    1.625841 ms/op        -        141 runs   1.37 s
    ✔ aggregateVerify blstZ - 128 sets                                    195.8119 ops/s    5.106943 ms/op        -         60 runs  0.835 s
    ✔ aggregateVerify blstTS - 128 sets                                   198.9466 ops/s    5.026474 ms/op        -         64 runs   1.67 s
    ✔ aggregateVerify blstZ - 256 sets                                    109.7394 ops/s    9.112496 ms/op        -         67 runs   1.15 s
    ✔ aggregateVerify blstTS - 256 sets                                   111.8205 ops/s    8.942902 ms/op        -         37 runs   1.81 s

  verifyMultipleAggregateSignatures
    ✔ verifyMultiAggSig blstZ - 1 sets                                    1104.212 ops/s    905.6230 us/op        -        335 runs  0.805 s
    ✔ verifyMultiAggSig blstTS - 1 sets                                   1045.095 ops/s    956.8510 us/op        -        305 runs  0.825 s
    ✔ verifyMultiAggSig blstZ - 8 sets                                    969.0768 ops/s    1.031910 ms/op        -        490 runs   1.01 s
    ✔ verifyMultiAggSig blstTS - 8 sets                                   946.6237 ops/s    1.056386 ms/op        -        444 runs   1.26 s
    ✔ verifyMultiAggSig blstZ - 32 sets                                   459.6397 ops/s    2.175617 ms/op        -        235 runs   1.01 s
    ✔ verifyMultiAggSig blstTS - 32 sets                                  463.3953 ops/s    2.157985 ms/op        -         61 runs  0.992 s
    ✔ verifyMultiAggSig blstZ - 128 sets                                  151.8325 ops/s    6.586206 ms/op        -        127 runs   1.34 s
    ✔ verifyMultiAggSig blstTS - 128 sets                                 155.2848 ops/s    6.439781 ms/op        -         37 runs   1.32 s
    ✔ verifyMultiAggSig blstZ - 256 sets                                  86.43708 ops/s    11.56911 ms/op        -         19 runs  0.722 s
    ✔ verifyMultiAggSig blstTS - 256 sets                                 86.21127 ops/s    11.59941 ms/op        -         16 runs   1.26 s

@spiral-ladder
Copy link
Copy Markdown
Collaborator Author

Integrated the changes from cayman/blst-mt here, and numbers are a lot more reasonable for verifyMultipleAggregateSignatures. Turns out the pooling done on the binding layer was the difference

bindings/perf/blst.test.ts
  verifyMultipleAggregateSignatures
    ✔ lodestar-z  3 sets                                                  1076.704 ops/s    928.7600 us/op        -         99 runs   1.31 s
    ✔ @chainsafe/blst  3 sets                                             1066.431 ops/s    937.7070 us/op        -        341 runs   1.81 s
    ✔ lodestar-z  8 sets                                                  983.2784 ops/s    1.017006 ms/op        -        205 runs   2.85 s
    ✔ @chainsafe/blst  8 sets                                             995.3666 ops/s    1.004655 ms/op        -         76 runs   2.32 s
    ✔ lodestar-z  32 sets                                                 469.6891 ops/s    2.129068 ms/op        -         17 runs   3.56 s
    ✔ @chainsafe/blst  32 sets                                            465.6705 ops/s    2.147441 ms/op        -         25 runs   3.61 s
    ✔ lodestar-z  64 sets                                                 304.1478 ops/s    3.287875 ms/op        -         13 runs   4.38 s
    ✔ @chainsafe/blst  64 sets                                            285.2177 ops/s    3.506094 ms/op        -         33 runs   4.78 s
    ✔ lodestar-z  128 sets                                                156.0534 ops/s    6.408061 ms/op        -         15 runs   5.03 s
    ✔ @chainsafe/blst  128 sets                                           149.5211 ops/s    6.688019 ms/op        -         15 runs   5.00 s

@spiral-ladder
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to significantly improve performance by introducing a persistent thread pool (PairingPool) for parallelizing BLS verification operations, reducing thread management overhead and memory allocations. However, the current implementation of the PairingPool in bindings/napi/blst.zig contains critical race conditions due to the use of a global singleton and shared scratch buffers without proper synchronization. This poses a significant security risk, including potential signature bypass and denial of service in multi-threaded environments. Additionally, the review identified areas for improvement in adherence to the project's style guide, specifically regarding constant naming, explicit type usage, assertion density, an unused import, and safer handling of pointer operations.

Comment thread bindings/napi/blst.zig Outdated
Comment thread bindings/napi/blst.zig
Comment thread bindings/napi/blst.zig Outdated
Comment thread bindings/napi/blst.zig
Comment thread bindings/napi/blst.zig
Comment thread bindings/napi/root.zig Outdated
@spiral-ladder
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a persistent thread pool in Zig to parallelize BLS verification, which significantly improves performance as demonstrated by the new benchmarks. The implementation is well-structured, and the zero-copy optimizations are a great addition. My review includes several suggestions to align the new code with the repository's style guide, a critical fix for a potential memory alignment issue, and a point for discussion regarding build optimization settings.

Note: Security Review did not run due to the size of the PR.

Comment thread bindings/napi/blst.zig Outdated
Comment thread bindings/napi/blst.zig
Comment on lines +519 to +531
const AggVerifyJob = struct {
pairing_bufs: [][Pairing.sizeOf()]u8,
has_work: []bool,
counter: std.atomic.Value(usize),
err_flag: std.atomic.Value(bool),
sig: *const Signature,
sig_groupcheck: bool,
msgs: []const *const [32]u8,
dst: []const u8,
pks: []const *const PublicKey,
pks_validate: bool,
n_elems: usize,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This struct definition violates a few style guide rules:

  • Rule 216: snake_case is used for field names (e.g., pairing_bufs, has_work) instead of the required camelCase.
  • Rule 221: Several field names are abbreviated (e.g., sig, msgs, pks, dst, n_elems).

Please use descriptive, camelCase names to improve readability and adhere to the style guide. This applies to VerifyMultiJob and other new variables as well.

const AggVerifyJob = struct {
    pairingBufs: [][Pairing.sizeOf()]u8,
    hasWork: []bool,
    counter: std.atomic.Value(usize),
    errorFlag: std.atomic.Value(bool),
    signature: *const Signature,
    signatureGroupcheck: bool,
    messages: []const *const [32]u8,
    domainSeparationTag: []const u8,
    publicKeys: []const *const PublicKey,
    publicKeysValidate: bool,
    elementsCount: usize,
};
References
  1. Use camelCase for function, and variable names. Use snake_case for file names. (link)
  2. Do not abbreviate variable names, unless the variable is a primitive integer type used as an argument to a sort function or matrix calculation. (link)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think this is very pedantic, we should probably change the style guide

Comment thread bindings/napi/blst.zig Outdated
Comment thread bindings/napi/blst.zig
Comment on lines +602 to +604
.scratch_msg_ptrs = allocator.alloc(*const [32]u8, SCRATCH_MAX) catch @panic("PairingPool: OOM"),
.scratch_pk_ptrs = allocator.alloc(*const PublicKey, SCRATCH_MAX) catch @panic("PairingPool: OOM"),
.scratch_sig_ptrs = allocator.alloc(*const Signature, SCRATCH_MAX) catch @panic("PairingPool: OOM"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The style guide (line 400) enforces a hard line length limit of 100 columns. These lines exceed that limit. Please reformat them to comply. zig fmt can often help with this if you add a trailing comma inside the function call.

            .scratch_msg_ptrs = allocator.alloc(*const [32]u8, SCRATCH_MAX) catch @panic(
                "PairingPool: OOM",
            ),
            .scratch_pk_ptrs = allocator.alloc(*const PublicKey, SCRATCH_MAX) catch @panic(
                "PairingPool: OOM",
            ),
            .scratch_sig_ptrs = allocator.alloc(*const Signature, SCRATCH_MAX) catch @panic(
                "PairingPool: OOM",
            ),
References
  1. Hard limit all line lengths, without exception, to at most 100 columns for a good typographic "measure". (link)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having it all in one line looks cleaner

Comment thread bindings/napi/blst.zig
Comment thread bindings/napi/blst.zig
Comment thread package.json
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test file we have both benchmarks for blst-z and blst-ts but this is really just for the PRs sake, we should remove the typescript benchmarks once we're happy with results.

Copy link
Copy Markdown
Collaborator

@twoeths twoeths left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did not finish review, overall looks good

  • I want to improve the documentation part to make it easier to maintain and follow
  • also I feel like PairingPool should stay at the native layer
    only pull napi specific logic to this binding layer (maybe scratch buffers are the only ones)

the goal is to also use PairingPool for a native zig client. We will need it anyway.

Comment thread bindings/napi/blst.zig Outdated
Comment thread bindings/napi/blst.zig
Comment thread bindings/napi/blst.zig
Comment thread bindings/napi/blst.zig Outdated
for (work_ready) |*e| e.* = .{};
for (work_done) |*e| e.* = .{};

const threads = allocator.alloc(std.Thread, n_bg) catch @panic("PairingPool: OOM");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is it free()?

How about add a deinit and wire it into the existing InstanceData.finalize path:

fn deinit(pool: *PairingPool) void {
    // Signal all workers to exit, then wait (or just accept the leak is intentional)
    allocator.free(pool.workers);
    allocator.free(pool.work_items);
    // ... all other fields
    allocator.destroy(pool);
    instance = null;
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread bindings/napi/blst.zig
Comment thread bindings/napi/blst.zig Outdated
Comment thread bindings/napi/blst.zig Outdated
Comment thread bindings/napi/blst.zig Outdated
Comment thread bindings/napi/blst.zig Outdated
Comment thread bindings/napi/blst.zig
@spiral-ladder
Copy link
Copy Markdown
Collaborator Author

spiral-ladder commented Mar 9, 2026

  • also I feel like PairingPool should stay at the native layer
    only pull napi specific logic to this binding layer (maybe scratch buffers are the only ones)

the goal is to also use PairingPool for a native zig client. We will need it anyway.

hmm, that's kinda the problem I ran into. My previous iteration put the memory pools at the native layer, but that was slow versus @wemeetagain 's suggestion branch. Perhaps my previous approach was wrong, i'll try again in another branch and see if it beats this. I suspect the reason is that managing memory on the layer above the native layer is better than at the native layer itself, since we can batch things before crossing the napi boundary

@GrapeBaBa
Copy link
Copy Markdown
Contributor

  • also I feel like PairingPool should stay at the native layer
    only pull napi specific logic to this binding layer (maybe scratch buffers are the only ones)

the goal is to also use PairingPool for a native zig client. We will need it anyway.

hmm, that's kinda the problem I ran into. My previous iteration put the memory pools at the native layer, but that was slow versus @wemeetagain 's suggestion branch. Perhaps my previous approach was wrong, i'll try again in another branch and see if it beats this. I suspect the reason is that managing memory on the layer above the native layer is better than at the native layer itself, since we can batch things before crossing the napi boundary

Is it caused by turn on the protable mode in the ChainSafe/blst-z#60? Rust binding uses false for default.

@matthewkeil
Copy link
Copy Markdown
Member

Is it caused by turn on the protable mode in the ChainSafe/blst-z#60? Rust binding uses false for default.

Its possible but not guaranteed. We did testing for perf when we made that change and there was virtually no difference. We have a number of users that had hard crashes on older cpus and it required the portable flag to resolve the issue.

@spiral-ladder
Copy link
Copy Markdown
Collaborator Author

consolidating in #306

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants