Skip to content

feat: model phase0 Validator as struct#232

Open
twoeths wants to merge 15 commits intomainfrom
te/container_node_struct_2
Open

feat: model phase0 Validator as struct#232
twoeths wants to merge 15 commits intomainfrom
te/container_node_struct_2

Conversation

@twoeths
Copy link
Copy Markdown
Collaborator

@twoeths twoeths commented Mar 10, 2026

Motivation

Epoch transition accesses and mutates Validator structs for every validator on every epoch. The existing FixedContainerType represents each validator as a merkle tree of individual leaf nodes, making field access and hashTreeRoot expensive due to per-field tree traversal. This PR introduces a specialized container type that stores validators as plain structs in the pool, significantly reducing epoch transition overhead.

Description

New branch_struct node kind in Node.Pool — the pool now supports nodes that hold a type-erased BranchStructRef (a vtable with get_root and deinit). The pointer is stored by splitting a usize across the existing left and right u32 fields (high 32 bits in left, low 32 bits in right). Node state encoding is extended from 2-bit to 3-bit to accommodate branch_struct_lazy and branch_struct_computed states.

StructContainerType — a new SSZ container variant wrapping FixedContainerType that uses branch_struct nodes instead of per-field leaf trees. Exposes a WrappedT interface satisfying the pool's vtable requirements.

StructContainerTreeView — tree view backed directly by a struct pointer in the pool. Mutations are buffered in an Optional(T) (all fields wrapped in ?T) and flushed lazily on commit().

Validator migrated to StructContainerType, making validator field access and hash root computation O(1) instead of O(fields).

Benchmark:

saved 700ms - 800ms on my Macbook

this branch

benchmark              runs     total time     time/run (avg ± σ)     (min ... max)                p75        p99        p995      
-----------------------------------------------------------------------------------------------------------------------------
justification_finaliza 50       15.874s        317.48ms ± 2.797ms     (312.972ms ... 327.392ms)    318.513ms  327.392ms  327.392ms 
inactivity_updates     50       32.102s        642.057ms ± 8.203ms    (631.702ms ... 686.41ms)     643.951ms  686.41ms   686.41ms  
rewards_and_penalties  50       41.009s        820.199ms ± 8.929ms    (810.211ms ... 855.54ms)     824.288ms  855.54ms   855.54ms  
registry_updates       50       16.326s        326.521ms ± 2.667ms    (322.096ms ... 336.626ms)    327.404ms  336.626ms  336.626ms 
slashings              50       16.362s        327.244ms ± 2.977ms    (320.892ms ... 333.85ms)     328.191ms  333.85ms   333.85ms  
eth1_data_reset        50       16.414s        328.284ms ± 3.846ms    (322.395ms ... 340.011ms)    330.758ms  340.011ms  340.011ms 
pending_deposits       50       16.434s        328.685ms ± 2.371ms    (323.978ms ... 333.229ms)    330.442ms  333.229ms  333.229ms 
pending_consolidations 50       16.336s        326.739ms ± 3.099ms    (321.993ms ... 334.254ms)    328.098ms  334.254ms  334.254ms 
effective_balance_upda 50       33.315s        666.311ms ± 8.761ms    (654.195ms ... 697.644ms)    671.052ms  697.644ms  697.644ms 
slashings_reset        50       16.564s        331.298ms ± 3.682ms    (324.148ms ... 341.037ms)    333.608ms  341.037ms  341.037ms 
randao_mixes_reset     50       18.158s        363.175ms ± 64.672ms   (327.257ms ... 706.4ms)      359.816ms  706.4ms    706.4ms   
historical_summaries   50       17.188s        343.769ms ± 5.729ms    (336.143ms ... 370.109ms)    346.321ms  370.109ms  370.109ms 
participation_flags    50       4.612ms        92.243us ± 42.993us    (83.25us ... 389.459us)      87.291us   389.459us  389.459us 
sync_committee_updates 50       2.005ms        40.116us ± 4.021us     (36.959us ... 60.25us)       40.167us   60.25us    60.25us   
proposer_lookahead     50       20.103s        402.071ms ± 5.264ms    (395.414ms ... 425.04ms)     403.339ms  425.04ms   425.04ms  
epoch(non-segmented)   50       1m23.941s      1.678s ± 15.644ms      (1.656s ... 1.721s)          1.687s     1.721s     1.721s    
epoch(segmented)       50       1m23.614s      1.672s ± 19.973ms      (1.642s ... 1.725s)          1.685s     1.725s     1.725s    

main

benchmark              runs     total time     time/run (avg ± σ)     (min ... max)                p75        p99        p995      
-----------------------------------------------------------------------------------------------------------------------------
justification_finaliza 50       55.573s        1.111s ± 14.96ms       (1.086s ... 1.135s)          1.122s     1.135s     1.135s    
inactivity_updates     50       1m11.86s       1.437s ± 27.618ms      (1.405s ... 1.515s)          1.451s     1.515s     1.515s    
rewards_and_penalties  50       1m19.242s      1.584s ± 17.398ms      (1.564s ... 1.673s)          1.59s      1.673s     1.673s    
registry_updates       50       55.08s         1.101s ± 41.517ms      (1.077s ... 1.378s)          1.103s     1.378s     1.378s    
slashings              50       55.199s        1.103s ± 25.483ms      (1.081s ... 1.191s)          1.11s      1.191s     1.191s    
eth1_data_reset        50       54.635s        1.092s ± 10.314ms      (1.073s ... 1.121s)          1.1s       1.121s     1.121s    
pending_deposits       50       54.587s        1.091s ± 11.047ms      (1.069s ... 1.118s)          1.098s     1.118s     1.118s    
pending_consolidations 50       54.735s        1.094s ± 17.714ms      (1.075s ... 1.175s)          1.096s     1.175s     1.175s    
effective_balance_upda 50       1m11.169s      1.423s ± 29.093ms      (1.397s ... 1.556s)          1.425s     1.556s     1.556s    
slashings_reset        50       55.047s        1.1s ± 31.976ms        (1.079s ... 1.294s)          1.103s     1.294s     1.294s    
randao_mixes_reset     50       54.799s        1.095s ± 11.583ms      (1.078s ... 1.123s)          1.101s     1.123s     1.123s    
historical_summaries   50       54.284s        1.085s ± 10.711ms      (1.073s ... 1.123s)          1.091s     1.123s     1.123s    
participation_flags    50       6.003ms        120.07us ± 122.772us   (86.833us ... 959.458us)     112.583us  959.458us  959.458us 
sync_committee_updates 50       2.067ms        41.355us ± 37.029us    (31.541us ... 296.708us)     37.958us   296.708us  296.708us 
proposer_lookahead     50       58.971s        1.179s ± 19.972ms      (1.162s ... 1.258s)          1.18s      1.258s     1.258s    
epoch(non-segmented)   50       1m58.809s      2.376s ± 75.621ms      (2.295s ... 2.52s)           2.439s     2.52s      2.52s     
epoch(segmented)       50       1m57.583s      2.351s ± 70.882ms      (2.295s ... 2.695s)          2.356s     2.695s     2.695s  

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical performance bottleneck during epoch transitions by revamping the internal representation of Validator structs. By moving away from a merkle tree of individual leaf nodes to a specialized StructContainerType that stores validators as plain structs in the node pool, the PR drastically reduces the computational cost of accessing and mutating validator fields. This fundamental change in data handling, supported by new branch_struct node kinds and StructContainerTreeView, results in a more efficient and faster epoch transition process.

Highlights

  • Performance Optimization: Migrated the Validator struct to a new StructContainerType to significantly reduce the overhead of field access and hash root computation during epoch transitions, leading to substantial performance improvements.
  • New Data Structures: Introduced StructContainerType and StructContainerTreeView to provide a more efficient way to handle SSZ container types that represent plain structs, backed directly by struct pointers in the node pool.
  • Node Pool Enhancements: Extended the Node.Pool to support a new branch_struct node kind, which stores type-erased BranchStructRef pointers, and updated node state encoding to accommodate these new types.
  • Benchmark Results: Achieved significant performance gains, with epoch transition benchmarks showing a reduction in execution time by approximately 700-800ms on a MacBook.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/consensus_types/phase0.zig
    • Updated the Validator definition to use ssz.StructContainerType instead of ssz.FixedContainerType.
  • src/persistent_merkle_tree/Node.zig
    • Modified Node.State encoding to use 3 bits for node type, increasing the number of possible node states.
    • Added new branch_struct_lazy and branch_struct_computed states to Node.State.
    • Introduced BranchStructRef to manage type-erased struct pointers within the node pool.
    • Implemented createBranchStruct to allocate and store struct clones in the pool.
    • Added getStructPtr and getBranchStructRefUnsafe for retrieving struct pointers from the pool.
    • Enhanced the unref function to properly deinitialize BranchStructRef for branch_struct nodes.
    • Updated noChild logic to correctly handle branch_struct nodes.
    • Modified getRoot to compute hashes for branch_struct_lazy nodes.
  • src/ssz/root.zig
    • Exported StructContainerType and StructContainerTreeView for broader use.
  • src/ssz/tree_view/container.zig
    • Imported Optional and Empty utilities.
    • Implemented StructContainerTreeView to provide a tree view backed by a struct pointer, supporting buffered mutations and lazy flushing.
    • Added a comprehensive test case for StructContainerTreeView.
  • src/ssz/tree_view/root.zig
    • Exported StructContainerTreeView.
  • src/ssz/tree_view/utils/optional.zig
    • Added a new utility file defining Optional and Empty functions for creating struct types with all optional fields.
  • src/ssz/type/container.zig
    • Imported StructContainerTreeView.
    • Defined StructContainerType as a fixed-size container type utilizing StructContainerTreeView for tree-view operations.
    • Included WrappedT struct to satisfy the branch_struct node interface requirements.
  • src/ssz/type/root.zig
    • Exported StructContainerType.
  • src/state_transition/test_utils/generate_state.zig
    • Adjusted the access method for pubkey within the Validator struct to align with the new StructContainerType implementation.
Activity
  • No specific activity has been recorded for this pull request since its creation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant performance optimization by changing how Validator objects are stored in the SSZ merkle tree. Instead of a tree of fields, they are now stored as a pointer to a struct, enabling O(1) access. This is achieved by adding a new branch_struct node type to the Node.Pool, along with a new StructContainerType and StructContainerTreeView. The implementation is clever, using type erasure and comptime features of Zig. My review focuses on improving the safety and clarity of the new code, in line with the repository's style guide. I've identified a few areas where assertions can be added and where pointer casting can be made safer and more explicit.

Note: Security Review did not run due to the size of the PR.

Comment on lines +842 to +845
pub fn fromValue(pool: *Node.Pool, value: *const Type) !Node.Id {
const wrapped_ptr: *const WrappedT = @ptrCast(value);
return try pool.createBranchStruct(WrappedT, wrapped_ptr);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation relies on a potentially unsafe pointer cast from *const Type to *const WrappedT. While it may work due to the memory layout of WrappedT, it's not explicit and could be fragile. This is inconsistent with the safer approach taken in deserializeFromBytes. Creating a temporary WrappedT on the stack and passing its pointer is safer and clearer, aligning with the style guide's emphasis on clarity and simplicity (rule 32).

            pub fn fromValue(pool: *Node.Pool, value: *const Type) !Node.Id {
                const wrapped = WrappedT{ .value = value.* };
                return try pool.createBranchStruct(WrappedT, &wrapped);
            }
References
  1. The style guide recommends using simple and explicit code. The current pointer cast is subtle and relies on memory layout, which is not explicit. The suggested change makes the code safer and easier to understand. (link)

Comment thread src/persistent_merkle_tree/Node.zig Outdated
}

pub inline fn isBranchStruct(node: State) bool {
return @intFromEnum(node) & @intFromEnum(branch_struct_lazy) != 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using the explicit helper functions isBranchStructLazy() and isBranchStructComputed() makes the intent clearer and aligns better with the style guide's preference for simplicity and explicitness over subtle bitwise operations (rule 32).

        return node.isBranchStructLazy() or node.isBranchStructComputed();
References
  1. The style guide recommends using simple and explicit control flow for clarity. The suggested change replaces a subtle bitwise operation with a more readable and explicit boolean expression using existing helper functions. (link)

}

/// The pool allocates and owns a clone of `ptr`; the caller retains ownership of its data.
pub fn createBranchStruct(self: *Pool, comptime T: type, ptr: *const T) Error!Id {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The style guide (rule 51) requires asserting all function arguments. The ptr argument is not checked for null. Please add std.debug.assert(ptr != null); at the beginning of this function to prevent potential null pointer dereferences.

References
  1. The style guide mandates asserting all function arguments to detect programmer errors early. This function is missing a null check for the ptr argument. (link)

@wemeetagain wemeetagain marked this pull request as ready for review March 17, 2026 16:45
@wemeetagain wemeetagain requested a review from a team as a code owner March 17, 2026 16:45
Base automatically changed from te/refactor_treeview_2 to main March 17, 2026 16:47
@twoeths
Copy link
Copy Markdown
Collaborator Author

twoeths commented Mar 18, 2026

closing in favor of #247

@twoeths twoeths closed this Mar 18, 2026
@twoeths twoeths reopened this Mar 18, 2026
@twoeths twoeths force-pushed the te/container_node_struct_2 branch from e9c9929 to 6f0c12b Compare March 18, 2026 04:06
Copy link
Copy Markdown
Contributor

@lodekeeper-z lodekeeper-z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat: model phase0 Validator as struct

Important PR — this is one of two competing approaches (alongside #247) for the critical BeaconStateView workstream. Both target the same problem: O(log n) per-field access on Validator tree views is too slow for process_epoch with ~1M validators. I reviewed #247 in detail; this PR deserves equal scrutiny.

Core Design

The approach stores the entire Validator struct in a heap-allocated BranchStructRef pointed to from the pool's left/right slots (pointer split across two u32 fields). A type-erased vtable (get_root, deinit) enables the pool to hash and clean up the struct without knowing its concrete type. The StructContainerTreeView then uses pool.getStructPtr() for O(1) field access instead of tree traversal.

What's good

  1. O(1) access without view instantiation. Any code that navigates to a validator node can call pool.getStructPtr(node_id, T) directly — no need to create a StructContainerTreeView. For process_epoch iterating ~1M validators, this eliminates the view allocation + tree-readback overhead per validator. This is the key advantage over #247, which requires instantiating a view for every access.

  2. Optional(T) utility is clean. Comptime-generating a struct with all fields ?T for dirty tracking is elegant and type-safe.

  3. Incremental commit() only rebuilds the node when fields were actually changed, and the changed-field detection via Optional is lightweight.

  4. CI is mostly green — core build, tests, spec tests all pass. Only bindings fail (expected, needs coordinated update).

Issues

🔴 1. isBranchStruct has a subtle bit-pattern bug

pub inline fn isBranchStruct(node: State) bool {
    return @intFromEnum(node) & @intFromEnum(branch_struct_lazy) != 0;
}

branch_struct_lazy = 0x40000000 (bit pattern 0100...). This check tests whether bit 30 is set. But branch_struct_computed = 0x50000000 (bit pattern 0101...) also has bit 30 set, so isBranchStruct correctly catches both. However, any future node type with bit 30 set (type codes 4-7) would also match. More concerning: if someone adds type 0x60000000 or 0x70000000 later, isBranchStruct would return true for non-struct-branch types.

The existing isBranch has the same pattern (& branch_lazy != 0), and it "works" because branch_lazy = 0x20000000 and branch_computed = 0x30000000 both have bit 29 set. But with 3-bit type codes, the bitwise shortcut is fragile.

Safer: @intFromEnum(node) & node_type >= @intFromEnum(branch_struct_lazy) or explicitly check both: isBranchStructLazy() or isBranchStructComputed().

🔴 2. max_ref_count halved — is 268M enough?

The 3-bit type code reduces max_ref_count from 0x1FFFFFFF (536M) to 0x0FFFFFFF (268M). For mainnet with ~1M validators × ~8 nodes per validator tree = ~8M nodes, max refcount would need to accommodate deep sharing (e.g., one root referenced by many views). 268M is likely still fine, but this should be documented as an intentional tradeoff. If node pools grow for checkpoint states (multiple states sharing subtrees), refcounts could increase.

🔴 3. Pointer encoding in left/right is fragile across architectures

Storing a pointer split across two u32 slots works on 64-bit, but:

  • On 32-bit, only right is used and left is zeroed. This breaks getLeftChild()/getRightChild() for any code that traverses a branch_struct node without checking noChild first.
  • The @intFromPtr/@ptrFromInt round-trip assumes the allocator returns pointers in the lower 48 bits (standard for x86-64 user space). If Zig ever uses a custom allocator mapping high memory (or tagged pointers), this silently breaks.
  • getBranchStructRefUnsafe is called from unref's cleanup path — if the pointer gets corrupted (e.g., partial write), the deinit call will crash non-deterministically.

Consider: store BranchStructRef pointers in a separate ArrayList indexed by node_id, keeping the pool's SoA layout clean. The memory overhead is one pointer per branch-struct node (vs zero with pointer encoding), but the safety and maintainability win is significant.

🟡 4. fromValue does an unsafe @ptrCast from *const Type to *const WrappedT

pub fn fromValue(pool: *Node.Pool, value: *const Type) !Node.Id {
    const wrapped_ptr: *const WrappedT = @ptrCast(value);
    return try pool.createBranchStruct(WrappedT, wrapped_ptr);
}

WrappedT wraps Type as its first (and only data) field. The @ptrCast relies on WrappedT having the same address as its first field. This is guaranteed by Zig for extern struct but not for auto layout (the compiler may reorder or pad fields). Since WrappedT uses auto layout and has function pointers via vtable, this cast is technically UB if the compiler inserts padding before .value.

Fix: explicitly construct a WrappedT from the value instead of casting:

const wrapped = WrappedT{ .value = value.* };
return try pool.createBranchStruct(WrappedT, &wrapped);

🟡 5. StructContainerTreeView.commit() replaces the root — but callers may hold stale references

When commit() creates a new branch_struct node and updates self.root, any other code holding the old root ID still points to the pre-mutation tree. This is fine for COW semantics (both the old and new trees are valid), but the self.original_value pointer now points into the new node's allocation while the old node may still be alive via other references. Since original_value is a borrow from the pool, this is correct — but subtle. A doc comment explaining the ownership transfer would help.

🟡 6. No test for pool.unref cleanup of branch_struct nodes

The test creates a view, uses it, and defers deinit. But there's no test verifying that when the last reference to a branch_struct node is dropped, the BranchStructRef.deinit is called and the allocation freed. A test with pool.unref() driving refcount to zero + checking for leaks (tight pool) would catch regressions in the cleanup path.

Devil's advocate: complexity budget

This PR adds a new node type to the pool — the most fundamental data structure in the SSZ tree. Every function that switches on node types (unref, getRoot, noChild, setNodesAtDepth, rebind) must now handle branch_struct correctly. Missing a case = silent corruption.

Compare #247: zero pool changes, view-level only, every existing pool operation works unchanged. The tradeoff is performance (view instantiation cost per access), which is real — but the blast radius of a pool bug is much larger than a view bug.

My recommendation: If this lands, add exhaustive switch statements on State.node_type (instead of bitwise checks) wherever node-type-dependent behavior exists. This way, adding a new node type forces handling at every callsite — the compiler catches it.

vs #247

#232 (this PR) #247
Pool changes New node type + pointer encoding None
Access pattern O(1) via getStructPtr directly O(1) via view (must instantiate first)
Blast radius Pool-level (affects all tree ops) View-level only
process_epoch perf Better (no view alloc per validator) Worse (view alloc + tree readback per access)
Composability Pool knows about struct storage Pool stays clean

Both approaches are correct and well-tested. The ideal path may be landing #247 first (simpler, safer), then layering #232's pool optimization underneath once benchmarks confirm the view-instantiation cost matters. But if #232's perf advantage is significant for process_epoch, it could land directly with the safety improvements noted above.

Overall: architecturally sound, the O(1) access advantage is real, but needs the safety fixes (especially the @ptrCast and bit-pattern issues) before merging. 👍

Comment thread src/persistent_merkle_tree/Node.zig
Comment thread src/ssz/type/container.zig

branch_struct_ref.* = .{
.ptr = @ptrCast(@constCast(cloned)),
.get_root = struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Pointer split across left/right — consider a side table.

Encoding a heap pointer into two u32 node fields works but couples node storage with pointer representation. Every pool operation touching left/right now needs to know that some nodes store pointers, not child IDs.

Alternative: a HashMap(Id, *BranchStructRef) or compact ArrayList indexed by node ID. Memory cost is one pointer per struct-branch node (negligible vs the struct itself), but pool operations stay clean — left/right always mean child node IDs.

Not blocking since the current approach works correctly on 64-bit, but worth considering for maintainability.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding a heap pointer into two u32 node fields works but couples node storage with pointer representation

I agree if we go from the current/old design point of view. With the new approach of this PR, it changes semantic of node storage: it could be either left/right nodes or pointer to some data

Every pool operation touching left/right now needs to know that some nodes store pointers, not child IDs.

good call on it. The only getLeft()/getRight() calls that affect is proof, which is addressed in 91b030a
for ssz, we call getRight() to get length of a list, and that's encoded as a regular branch node, not branch_struct node

}

pub fn hashTreeRootInto(self: *Self, out: *[32]u8) !void {
try self.commit();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 commit() ownership transfer is subtle.

After commit, self.original_value points into the new node's allocation (via pool.getStructPtr(new_root, T)), the old root is unref'd. If other code holds a reference to the old root, they still see the pre-mutation struct via their own getStructPtr — correct COW.

But self.original_value is a raw pointer borrowed from the pool — if the pool compacts or the node is freed by another path, this dangling. Worth a doc comment explaining that original_value is valid only while self.root's refcount > 0.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified the TreeView implementation following https://github.com/ChainSafe/lodestar-z/pull/247/changes#diff-daf52008231bbeea53f1ef1e1668986abf072f4c7d57c0d8d26c0895545cc8e6R430

original_value is not tracked anymore

@twoeths
Copy link
Copy Markdown
Collaborator Author

twoeths commented Mar 30, 2026

@lodekeeper-z thanks for the review and detailed comments.

  1. isBranchStruct has a subtle bit-pattern bug

Fixed.

  1. max_ref_count halved — is 268M enough?

not a red flag. While this branch reduces max_ref_count, it also dramatically reduces node count for the hottest structure. A Validator is represented by 1 node here, versus 17 nodes on main. So as validator count grows, this design reduces total live node population significantly compared to main, which makes the lower per-node refcount ceiling less concerning in practice.

  1. Pointer encoding in left/right is fragile across architectures

See inline replies below.

On 32-bit, only right is used and left is zeroed. This breaks getLeftChild()/getRightChild() for any code that traverses a branch_struct node without checking noChild first.

branch_struct nodes are intentionally not traversable as normal branch nodes. It's just the same concern to leaf node, so not a specific concern for this PR

The @intFromPtr/@ptrFromInt round-trip assumes the allocator returns pointers in the lower 48 bits (standard for x86-64 user space). If Zig ever uses a custom allocator mapping high memory (or tagged pointers), this silently breaks.

This implementation stores both the high and low bits of the pointer integer and reconstructs the full usize value, so it does not rely on lower-48-bit addressing.

getBranchStructRefUnsafe is called from unref's cleanup path — if the pointer gets corrupted (e.g., partial write), the deinit call will crash non-deterministically.

I don’t think “partial write” is a realistic failure mode here. The pointer is written once during node creation and decoded later during cleanup. If memory is corrupted by some unrelated bug then dereferencing it would of course be unsafe, but that is not specific to this encoding.

Consider: store BranchStructRef pointers in a separate ArrayList indexed by node_id, keeping the pool's SoA layout clean. The memory overhead is one pointer per branch-struct node (vs zero with pointer encoding), but the safety and maintainability win is significant.

I don’t think that tradeoff is clearly better here. Using a side table adds extra memory, an additional indirection, and worse locality for a very large pool. Reusing the existing left/right storage keeps access in-place and avoids maintaining a separate structure.

  1. fromValue does an unsafe @ptrCast from *const Type to *const WrappedT

Fixed.

  1. StructContainerTreeView.commit() replaces the root — but callers may hold stale references

Fixed as part of simplifying the TreeView implementation in #247.

  1. No test for pool.unref cleanup of branch_struct nodes

The unit test was enhanced to verify that the branch_struct root is returned to the pool/freelist after cleanup.

@lodekeeper-z
Copy link
Copy Markdown
Contributor

Verified the fixes:

#1 isBranchStruct (49f2caf): Correct — explicit equality checks instead of bit masking. No more false positives from overlapping patterns.

#4 fromValue (640b420): Correct — stack copy instead of type-punning. Clean.

#3 pointer encoding: Fair points. I was thinking about it from the perspective of adding a new node type to an existing SoA pool, but you're right that the pool's SoA fields are already coupled to node representation — left/right being repurposed for struct data isn't fundamentally different from how they're used for branch children. The locality argument is compelling for the pool sizes we're dealing with (millions of validators × nodes).

Remaining: deferring to #247 for #5 (commit stale refs) and the broader TreeView simplification. Will re-review there.

@twoeths
Copy link
Copy Markdown
Collaborator Author

twoeths commented Mar 30, 2026

@lodekeeper-z I'm not even sure if we should merge #247 first. Initially I was in favor of that PR because it's way simpler but it did not work (for the performance benefit) so I have to revisit this PR

I think we should disregard #247 and focus on this PR, you may request to bring some good simplification from there (I already brought the TreeView from there)

even with this PR, it changes fundamental design of persistent-merkle-tree so I'd make it in the best shape first before letting the team to review and have a consensus for it

Remaining: deferring to #247 for #5 (commit stale refs) and the broader TreeView simplification. Will re-review there.

I simplified the TreeView in this PR

@lodekeeper-z
Copy link
Copy Markdown
Contributor

Makes sense. If #247's simpler approach doesn't deliver the performance win, no point merging it first just to undo it later.

Happy to keep reviewing this PR as you shape it up. Let me know when you want another pass — or if there are specific parts of #247's simplifications you want me to evaluate for porting over.

@twoeths
Copy link
Copy Markdown
Collaborator Author

twoeths commented Mar 30, 2026

Makes sense. If #247's simpler approach doesn't deliver the performance win, no point merging it first just to undo it later.

Happy to keep reviewing this PR as you shape it up. Let me know when you want another pass — or if there are specific parts of #247's simplifications you want me to evaluate for porting over.

@lodekeeper-z please review again. Feel free to create another PR based on this branch for any optimization you find in #247 (or some new findings during your review)

Copy link
Copy Markdown
Contributor

@lodekeeper-z lodekeeper-z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed after fixes. Overall the design is solid — branch_struct reduces validator node count from ~17 to 1, which is a big win for memory and traversal. Findings:

✅ Verified fixes:

  1. isBranchStruct bit-pattern (49f2caf) — correct. Explicit type comparison instead of bit tricks, necessary with the 3-bit encoding.
  2. fromValue stack copy (640b420) — correct. No longer casts stack pointer to const reference.

✅ Looks good:

  • unref() path: isBranch() returns false for branch_struct, so we don't traverse packed pointer bits as child IDs. Struct cleanup happens correctly before returning to free list.
  • Proof materialization: createSingleProof limits to 1 branch_struct hop (NestedBranchStruct error). createCompactMultiProof uses temporary_roots array for lifetime management. Both clean up materialized trees on all paths.
  • noChild() correctly blocks navigation into branch_struct left/right.
  • setBranchStructComputed: 0x40000000 | 0x50000000 = 0x50000000 ✅ (lazy→computed transition preserves refcount bits)
  • Hashing: delegates to FixedContainerType.hashTreeRoot via WrappedT.getRoot — same hash whether struct or tree-backed.
  • Serialization: delegates entirely to FixedContainerType — no divergence possible.
  • Test coverage: struct proof test verifies witness equivalence between struct and tree paths.

🟡 Minor observations (non-blocking):

  1. commit() recreates entire branch_struct on every commit — even if only 1 of 8 fields changed. The changed bitset tracks which fields were modified but commit() calls ST.tree.fromValue(pool, &self.value) which clones the whole struct. This is fine for now since Validator is small (121 bytes), but worth noting for future struct types.

  2. getFieldRoot leaks a nodeChildST.tree.fromValue(pool, &field_value) creates a new tree node but never unrefs it. The returned root pointer is into the pool so it stays valid, but the node's refcount starts at 0 and will be reclaimed on next alloc. Currently only used in tests, so not a real leak, but could surprise future callers.

  3. WrappedT.init clones via FixedCT.clone — for Validator this is a memcpy of 121 bytes per create. Efficient enough, just noting the allocation pattern: each createBranchStruct does 2 allocations (WrappedT + BranchStructRef).

No blocking issues found. Approve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants