refactor(bls): use pointer arrays#253
refactor(bls): use pointer arrays#253spiral-ladder wants to merge 4 commits intobing/native-mt-blst-v2from
Conversation
Seems like copying value arrays might be one of the causes of RSS strain on feat3 which is running our native bls implementation for about ~48 hours now. See [thread](https://discord.com/channels/593655374469660673/1479085402395508976/1484525564181090426) on Discord for more details. The suggestion on the thread was to use [`napi_adjust_external_memory`](https://nodejs.org/api/n-api.html#napi-adjust-external-memory) - reason is that the native memory management is opaque to V8's GC. But I highly doubt this is the reason, since `blst-ts` also does not use this API. It is likely, it seems, that it's due to large amounts of allocations causing memory fragmentation when memory is freed/cleared.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant refactoring to the BLS (Boneh-Lynn-Shacham) implementation by transitioning from using value arrays to pointer arrays for handling cryptographic signatures and public keys. This change is a strategic effort to mitigate potential memory strain and fragmentation issues that have been observed in the native BLS operations, thereby improving the overall memory efficiency and stability of the system. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request refactors BLS signature and public key aggregation functions across the codebase to use slices of pointers ([]const *const Type) instead of slices of values ([]const Type). This change affects multiple files in bindings/napi/blst.zig, src/bls/, src/state_transition/, and test files, requiring adjustments in memory allocation and how objects are referenced. A critical bug was identified in src/bls/Signature.zig where pks[0] is accessed without checking if the pks slice is empty, which could lead to a crash. Additionally, several medium-severity improvement opportunities were noted in bindings/napi/blst.zig to explicitly pass options to env.unwrap functions, adhering to the repository's style guide for robustness and readability.
| pks[0], | ||
| pks_validate, |
There was a problem hiding this comment.
CRITICAL: The code is dereferencing a pointer without ensuring that the pks slice is not empty. Accessing pks[0] when pks is empty will cause a crash. Add a check to ensure pks.len > 0 before accessing pks[0].
if (n_elems == 0 or msgs.len != n_elems) {
return BlstError.VerifyFail;
}
if (pks.len == 0) {
return BlstError.VerifyFail; // Add this check
}
var pairing = Pairing.init(buffer, true, dst);
try pairing.aggregate(
pks[0],
pks_validate,
self,
sig_groupcheck,
&msgs[0],
null,
);
References
- Ensure that slices are not empty before accessing their elements to prevent out-of-bounds access.
| const sig = try env.unwrap(Signature, sig_value); | ||
| sigs[i] = sig.*; | ||
| sigs[i] = try env.unwrap(Signature, sig_value); | ||
| } |
There was a problem hiding this comment.
MEDIUM: According to the Lodestar State Transition Zig Style Guide, it's better to explicitly pass options to library functions instead of relying on defaults. While napi.newInstance might have sensible defaults, explicitly passing options improves readability and prevents potential bugs if the library changes its defaults in the future. See Repository Style Guide, lines 170-173.
sigs[i] = try env.unwrap(Signature, sig_value, .{}); // Explicitly pass options, even if empty
References
- Explicitly pass options to library functions at the call site, instead of relying on the defaults. This improves readability and avoids latent bugs if the library changes its defaults. (link)
|
|
||
| const pk_value = try pks_array.getElement(@intCast(i)); | ||
| pk_ptrs[i] = try env.unwrap(PublicKey, pk_value); | ||
| pks[i] = try env.unwrap(PublicKey, pk_value); |
There was a problem hiding this comment.
MEDIUM: Following the style guide, explicitly pass options to env.unwrap to ensure that the code is robust against future changes in the library's default behavior. See Repository Style Guide, lines 170-173.
pks[i] = try env.unwrap(PublicKey, pk_value, .{}); // Explicitly pass options
References
- Explicitly pass options to library functions at the call site, instead of relying on the defaults. This improves readability and avoids latent bugs if the library changes its defaults. (link)
| const pk_value = try pks_array.getElement(@intCast(i)); | ||
| const pk = try env.unwrap(PublicKey, pk_value); | ||
| pks[i] = pk.*; | ||
| pks[i] = try env.unwrap(PublicKey, pk_value); |
There was a problem hiding this comment.
MEDIUM: Per the style guide, explicitly pass options to env.unwrap to make the code more readable and prevent potential issues if the library's defaults change. See Repository Style Guide, lines 170-173.
pks[i] = try env.unwrap(PublicKey, pk_value, .{}); // Explicitly pass options
References
- Explicitly pass options to library functions at the call site, instead of relying on the defaults. This improves readability and avoids latent bugs if the library changes its defaults. (link)
| const sig_value = try sigs_array.getElement(@intCast(i)); | ||
| const sig = try env.unwrap(Signature, sig_value); | ||
| sigs[i] = sig.*; | ||
| sigs[i] = try env.unwrap(Signature, sig_value); |
There was a problem hiding this comment.
MEDIUM: For consistency and to adhere to the style guide, explicitly pass options to env.unwrap. See Repository Style Guide, lines 170-173.
sigs[i] = try env.unwrap(Signature, sig_value, .{}); // Explicitly pass options
References
- Explicitly pass options to library functions at the call site, instead of relying on the defaults. This improves readability and avoids latent bugs if the library changes its defaults. (link)
| const pk_value = try pks_array.getElement(@intCast(i)); | ||
| const pk = try env.unwrap(PublicKey, pk_value); | ||
| pks[i] = pk.*; | ||
| pks[i] = try env.unwrap(PublicKey, pk_value); |
There was a problem hiding this comment.
MEDIUM: To improve code clarity and prevent potential issues, explicitly pass options to env.unwrap. See Repository Style Guide, lines 170-173.
pks[i] = try env.unwrap(PublicKey, pk_value, .{}); // Explicitly pass options
References
- Explicitly pass options to library functions at the call site, instead of relying on the defaults. This improves readability and avoids latent bugs if the library changes its defaults. (link)
lodekeeper-z
left a comment
There was a problem hiding this comment.
Review: refactor(bls): use pointer arrays
Good refactor. The motivation is sound — BLS point structs are large (P1Affine = 48 bytes, P2Affine = 96 bytes), and the old code was copying them out of NAPI-wrapped objects just to pass to aggregation/verification functions that ultimately need pointers anyway. Pointer slices eliminate those copies and align the core BLS API with how ThreadPool.aggregateVerify already works.
What's good
-
fastAggregateVerifyPreAggregatedcleanup — The old@ptrCastfrom*const PublicKeyto[*]const PublicKeythen slicing to[0..1]was a type-punning hack. The new&[_]*const PublicKey{pk}is clean and safe. Nice. -
NAPI bindings —
env.unwrap()already returns a pointer to the wrapped object, so storing*const Tdirectly instead of copyingsig.* / pk.*is the right call. Less work, less memory. -
Consistency —
ThreadPool.aggregateVerifyalready took[]const *PublicKey. This bringsSignature.aggregateVerify,fastAggregateVerify,AggregatePublicKey.aggregate, andAggregateSignature.aggregatein line.
One thing to verify
blst_aggregateSerializedPublicKeys — This function deserializes into a pks: []PublicKey array, then builds a separate pk_ptrs pointing into it. That's correct, but it's now two allocations where there was one. For the NAPI hot path (if this is called in tight loops), worth confirming the extra alloc is negligible vs the deserialization cost. I expect it is — just flagging.
On Gemini's comments
- The "critical" about empty
pksinaggregateVerifyis a false positive — the existing guardif (n_elems == 0 or msgs.len != n_elems)at the top of the function already handles this. No change needed. - The
env.unwrap"explicit options" suggestions are noise —unwrapdoesn't take an options parameter in zapi.
Devil's advocate: why not []*const T everywhere?
The PR uses []const *const PublicKey in some places and []const *PublicKey in others (ThreadPool). This is fine for now since the const-ness propagates correctly through Zig's type system, but it'd be worth settling on one convention as more APIs adopt pointer slices. Not blocking.
Overall: clean, mechanical, correct. 👍
|
I am not sure using pointer arrays is better than value arrays in native zig client use case, maybe lose prefetch, cache friendly. Is there a bench comparison for both in native zig test? |
|
consolidating in #306 |
|
@GrapeBaBa your observation is right; this didn't make any noticeable difference and was mainly an attempt at diagnosing rss issues. closing this |
Seems like copying value arrays might be one of the causes of RSS strain on feat3 which is running our native bls implementation for about ~48 hours now. See thread on Discord for more details.
The suggestion on the thread was to use
napi_adjust_external_memory- reason is that the native memory management is opaque to V8's GC. But I highly doubt this is the reason, sinceblst-tsalso does not use this API.It is likely, it seems, that it's due to large amounts of allocations causing memory fragmentation when memory is freed/cleared.