Description
Attnets revamp
Since the launch of the beacon chain, the "backbone" of attestation subnets (attnets) relies upon staking nodes connecting to a random set of subnets. The size of this random set is dictated by the quantity of validators attached to the node, up to the maximum of 64 (ATTESTATION_SUBNET_COUNT
which maps to the expected SHARD_COUNT
). The general idea at genesis was that a node's validator requirments will scale linearly when sharding is released thus we can put this linear subnet requirement for subnet backbones as an "honesty" assumption until sharding comes around.
An attestation subnet backbone is required so that at each epoch, validators can quickly find and publish to their assigned subnet. If there was no notion of persistence in these subnets, then there would be no subnet to "find" in he ~6 minute window and those no clear place to publish individual attestations before they are aggregated.
Backbone requirements:
- Subnets are relatively stable, slot-to-slot and epoch-to-epoch, allowing for reliable dissemination of messages on demand
- Subnets entry points can be found in a relatively short time (< 1 minute) and short search overhead (within a few hops of the DHT)
- (Nice to have) A method to discern whether a node is "honestly" performing the expected backbone duty
Problems
There are a few issues with the current structure:
- This likely creates overly populated subnets in actuality, thus increasing the network's total bandwidth consumption with little to no gain
- This relies on an unenforcable "honesty" of validator nodes when the rational behavior is to turn your attnets down to 1 or even 0.
- As non-staking (user nodes) come online more and more, such (0-attnet) nodes will crowd the DHT, making the task of finding peers of particular subnets increasingly difficult. In the event that user nodes outpace staking nodes by 10-to-1 (a situation that should be applauded!), finding attestation backbones would become 10x more difficult.
Proposed solution
In an effort to solve the above issues, we propose:
- Remove random subnets per validator
- Add a single deterministic subnet per node, as a function of by the
node_id, epoch
Rather than putting the backbone requirement on a brittle validator-honesty assumption, this puts the backbone requirement on the entire network of full nodes such that, on average one out of every ATTESTATION_SUBNET_COUNT
nodes will be of a particular subnet.
This means that the size of subnets becomes a function of the total number of nodes on the network, rather than on staking node count combined with validator-node density. This means that we simultaneously reduce the over population of attnets by a small number of staking nodes and ensure that even if the Ethereum network (and DHT) grows by orders of magnitude, attnets will be able to be found within a few hops.
Additionally, due to the requisite subnet subscription being a function of a node's peer-id, honest/dishonesty wrt attnet backbone can be deterministically assessed allowing for peer downscoring and disconnects of dishonest peers.
The downside to this approach is that it puts a minimum of one attnet of load on every node rather than just on staking nodes, but in estimation, this is not a very high burden wrt home node resources and provides much more benefit in meeting the backbone requirements than negative.
Concrete spec mods
Remove random subscriptions
- Remove
RANDOM_SUBNETS_PER_VALIDATOR
from validator guide - Remove
attnets
from ENR and MetaData in p2p spec - Remove functionality described in Phase 0 attestation subnet stability
Add node-id subscription
- create function
compute_subnet_from_node_id(node_id) -> uint64
that takes in a node_id and returns a value on[0, ATTESTATION_SUBNET_COUNT)
. Consider anepoch
param that causes these subscriptions to slowly rotate on the order of ~1 day - Add "MAY downscore peers that do no actively subscribe/participate in their currently assigned subnet based on
compute_subnet_from_node_id
- In Lookahead section, replace
attnets
dht search withcompute_subnet_from_node_id
dht search
Strategy
We'd likely want to simulate and test this change in strategy in a controlled environment before pushing this to testnets and then mainnet.
Such a controlled environmnt to test gossipsub at scale seems critical to a number of the network optimization investigations underway.
Protocol Lab's testground could be a good candidate. Alternatively, another simulation framework or even spinning up 1k+ distributed networks for ~1 day tests could also be a viable path.
EDITED TO USE node_id
instead of peer_id
per discussions below