Implement basic validator custody framework (no backfill) #7578

pawanjay176 · 2025-06-07T07:27:00Z

Issue Addressed

Resolves #6767

Proposed Changes

This PR implements a basic version of validator custody.

It introduces a new CustodyContext object which contains info regarding number of validators attached to a node and the custody count they contribute to the cgc.
The CustodyContext is added in the da_checker and has methods for returning the current cgc and the number of columns to sample at head. Note that the logic for returning the cgc existed previously in the network globals.
To estimate the number of validators attached, we use the beacon_committee_subscriptions endpoint. This might overestimate the number of validators actually publishing attestations from the node in the case of multi BN setups. We could also potentially use the publish_attestations endpoint to get a more conservative estimate at a later point.
Anytime there's a change in the custody_group_count due to addition/removal of validators, the custody context should send an event on a broadcast channnel. The only subscriber for the channel exists in the network service which simply subscribes to more subnets. There can be additional subscribers in sync that will start a backfill once the cgc changes.

TODO

Currently, the logic only handles an increase in validator count and does not handle a decrease. We should ideally unsubscribe from subnets when the cgc has decreased.
Add a service in the CustodyContext that emits an event once MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS passes after updating the current cgc. This event should be picked up by a subscriber which updates the enr and metadata.
Add more tests

…hange

dknopik

Took a look at this as an opportunity to refamiliarise myself with the DAS code and spec. I hope I looked at the correct spec version x)

Also, just noting as I found no explicit TODO or code for this: I think we also need to update the ValidatorRegistrations if the effective balance changes due to beacon chain rewards, penalties, deposits and (partial) withdrawals.

beacon_node/beacon_chain/src/validator_custody.rs

beacon_node/lighthouse_network/src/service/mod.rs

beacon_node/beacon_chain/src/validator_custody.rs

jimmygchen · 2025-06-09T12:01:46Z

beacon_node/beacon_chain/src/beacon_chain.rs

+            .into();
+        debug!(?custody_context, "Persisting custody context to store");
+
+        if let Err(e) =


Is it necessary to clear the custody context here?
Thinking about scenarios where persist function below fails, then our we'd have nothing in the DB.

Also I think we should probably persist this more often, so we're crash safe? Perhaps when every time there's a change?

Yeah probably not necessary to clear it, I just copied the other persist functions. I'll fix that up.

Also I think we should probably persist this more often, so we're crash safe? Perhaps when every time there's a change?

Not sure its necessary. Its not catastrophic if the custody context doesn't get persisted. In case of an unclean shutdown, I think we might have bigger issues with other core beacon chain stuff not getting persisted.

Removed clearing and persist custody context last.

beacon_node/beacon_chain/src/beacon_chain.rs

beacon_node/beacon_chain/src/validator_custody.rs

beacon_node/lighthouse_network/src/types/globals.rs

beacon_node/lighthouse_network/src/types/topics.rs

jimmygchen · 2025-06-09T12:52:08Z

beacon_node/network/src/service.rs

+            beacon_chain
+                .data_availability_checker
+                .custody_context()
+                .sampling_count(&beacon_chain.spec),


should this be the advertised custody group count ?

No, we actually want to subscribe to sample_count number of subnets, but use advertised_custody_group for metadata and ENR

beacon_node/beacon_chain/src/validator_custody.rs

beacon_node/http_api/src/lib.rs

beacon_node/beacon_chain/src/validator_custody.rs

beacon_node/http_api/src/lib.rs

… necessary and the node could just advertise immediately to allow the node to serve whatever it has stored.

…lint issues.

dapplion · 2025-06-10T13:50:41Z

beacon_node/beacon_chain/src/validator_custody.rs

+/// Does not handle decreasing validator counts
+#[derive(Default, Debug)]
+struct ValidatorRegistrations {
+    /// Set of all validators that is registered to this node along with its effective balance


The effective balance at what state? The spec defines that we must use the effective balance of the latest finalized state. Do we have the guarantee that this balance is the expected one, and matches the current finalized state of this DB?

dapplion · 2025-06-10T13:54:30Z

beacon_node/beacon_chain/src/validator_custody.rs

+                .store(new_validator_custody, Ordering::Relaxed);
+
+            let updated_cgc = self.custody_group_count(spec);
+            // Send the message to network only if there are more columns subnets to subscribe to


Isn't it quite leaky for the beacon_chain crate to be aware of the network and send it messages?

Instead you can move this code to the network where you recv the validator custody events and call the network after register_validators in the HTTP api

dapplion · 2025-06-10T13:59:53Z

beacon_node/lighthouse_network/src/types/globals.rs

+        for custody_index in &custody_groups {
+            let columns = compute_columns_for_custody_group(*custody_index, &self.spec)
+                .expect("should compute custody columns for node");
+            sampling_columns.extend(columns);


Noting that now we have two sources of truth (3 items total) that we must carefully synchronize:

NetworkGlobals::sampling_subnets

NetworkGlobals::sampling_columns

DataAvailabilityChecker::custody_context

It's fine but having a single source of truth was the motivation of passing the cgc around

dapplion · 2025-06-10T14:03:24Z

beacon_node/http_api/src/lib.rs

+                            registrations,
+                            chain.slot().unwrap(),
+                            &chain.spec,
+                        );


In my POC I also used the builder registration, which is a much faster signal for those that use it

dapplion · 2025-06-10T14:04:11Z

beacon_node/beacon_chain/src/validator_custody.rs

+    /// This should trigger downstream actions like setting
+    /// a new cgc value in the enr and metadata fields and
+    /// performing any related cleanup actions.
+    AdvertisedCustodyCountChanged { new_custody_count: u64 },


Never used?

Some notes of mine: I came to define the AdvertisedCustodyCount as:

Thanks to the CGC registry updates you can construct a function of CGC(time). Consider da_window and now units of time. Then consider the set da_window_CGCs to contain all distinct values of CGC(time) function between now - da_window and now. Then the AdvertisedCustodyCount = min(da_window_CGCs). This value represents the minimum CGC value that we can guarantee to have all expect columns for in our DB for the da window.

That is why it's important to set the registry update to an epoch in the future to guarantee the above.

dapplion · 2025-06-10T14:10:11Z

beacon_node/beacon_chain/src/validator_custody.rs

+        slot: Slot,
+        spec: &ChainSpec,
+    ) {
+        let epoch = slot.epoch(E::slots_per_epoch());


The benefit of tracking the epoch in the updates is to uphold the following invariant

A given block will have the same CGC during its lifetime

To prevent situations where we receive the block from gossip with a CGC of 8 and the moment of importing turns out we are at 16, failing import. But the columns have already been gossiped (and ignored by us because we were CGC 8). In that case we have to trigger lookup sync to fetch the remaining columns and recover. It may work but it feels messy. Instead we can schedule the update for the next epoch to prevent this edge case.

pawanjay176 added 13 commits June 4, 2025 14:34

Add validator_custody_params to chainspec

dd6c8ab

First pass

4b43621

Better first version

8ff8d1e

plumbing

04cb05d

Add validator registration

b3227d0

Persist and load context from db

dfc412f

Update the custody at head based on registrations

d45f9a6

Fix the custody requirement calculation

0e23535

Move validator_custody to beacon_chain; broadcast to network on cgc c…

e8345d8

…hange

Fix a bunch of conditions; change internal api to use atomics

748a65b

Renames and some logic fixes

6ab8bbd

Remove unnecessary receiver

b23a4dc

Merge branch 'unstable' into validator-custody

464b9d7

pawanjay176 requested a review from jxs as a code owner June 7, 2025 07:27

pawanjay176 added work-in-progress PR is a work-in-progress fulu Required for the upcoming Fulu hard fork labels Jun 7, 2025

dknopik reviewed Jun 7, 2025

View reviewed changes

beacon_node/beacon_chain/src/validator_custody.rs Outdated Show resolved Hide resolved

beacon_node/lighthouse_network/src/service/mod.rs Show resolved Hide resolved

jimmygchen reviewed Jun 8, 2025

View reviewed changes

beacon_node/beacon_chain/src/validator_custody.rs Outdated Show resolved Hide resolved

jimmygchen reviewed Jun 8, 2025

View reviewed changes

beacon_node/beacon_chain/src/validator_custody.rs Show resolved Hide resolved

jimmygchen reviewed Jun 8, 2025

View reviewed changes

beacon_node/beacon_chain/src/validator_custody.rs Outdated Show resolved Hide resolved

jimmygchen reviewed Jun 8, 2025

View reviewed changes

beacon_node/beacon_chain/src/validator_custody.rs Outdated Show resolved Hide resolved

jimmygchen reviewed Jun 8, 2025

View reviewed changes

beacon_node/beacon_chain/src/validator_custody.rs Show resolved Hide resolved

jimmygchen reviewed Jun 8, 2025

View reviewed changes

beacon_node/beacon_chain/src/validator_custody.rs Outdated Show resolved Hide resolved

jimmygchen reviewed Jun 8, 2025

View reviewed changes

beacon_node/beacon_chain/src/validator_custody.rs Outdated Show resolved Hide resolved

jimmygchen reviewed Jun 8, 2025

View reviewed changes

beacon_node/beacon_chain/src/validator_custody.rs Show resolved Hide resolved