Skip to content

Gossipsub scoring parameters #2665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

AgeManning
Copy link
Contributor

Description

This introduces a set of recommended scoring parameters for the Eth 2.0 network.

A few points to note:

  • I've moved this into its own document (along with the current gossipsub v1.0 parameters) to avoid polluting the p2p spec document.
  • The values within are old values we estimated from mainnet and to keep the document short, I've removed most explanations of their origin. I plan on doing some further analysis on our current testnets and mainnet to update the values that Lighthouse is using (and subsequently this document) and also encourage other client teams to modify values they think may be out.
  • The subnet topics (and sync committee topics) do not have scoring values simply invalids as I've left these TBD.

This is designed to be a base we can work off, so edits/suggestions/feedback welcome

@ralexstokes
Copy link
Member

ralexstokes commented Oct 13, 2021

NOTE: this re-orgs the params and also introduces substantive changes to the values, e.g. seen_ttl from #2663

@dapplion
Copy link
Member

CC @tuyennhv

@djrtwo
Copy link
Contributor

djrtwo commented Dec 2, 2021

@AgeManning I'd like to get this merge in. Is the PR as it stands still ready for review?

Do other clients have any input here?

@AgeManning
Copy link
Contributor Author

I added this more as an initial template to work from.
The values here are the values Lighthouse currently uses, however we are planning on re-examining these values as I think they can be improved (they were originally designed before mainnet launch). If you're happy to treat it as an initial PR and potentially have some updates not too long in the future, then sure.

The other thing, (which Alex pointed out) is that I included the change in #2663 in this PR. As that's not merged yet, we might want to change this. (Lighthouse uses the change in #2633 along with a few other clients, I believe).

@nisdas
Copy link
Contributor

nisdas commented Jan 10, 2022

Do other clients have any input here?

My main concern would be with penalizing peers with respect to their mesh delivery rate. When we were experimenting with respective parameters to run prysm with we constantly had peers being penalised due to not being able to complete the requisite amount of mesh deliveries in time. This held true across all clients so it wasn't any client specific issue. With the delivery window being 400ms, this is a pretty tight threshold to have all peers in your mesh to deliver x messages in the allotted time period. The natural consequence of this would be that all your mesh peers are those that are geographically closer to you.

Also in the event the network has a drop in participation (ex: 20% of the validators are offline) , all peers in your aggregate/subnet mesh are more liable to be penalised because their mesh message delivery rate is less than the expected threshold. This could lead to a node marking 'good' peers as bad due to this and eventually banning them which could worsen network participation even further.

@twoeths
Copy link

twoeths commented Mar 20, 2022

Do other clients have any input here?

I have same concern to Prysm. Right now in lodestar, we configure meshMessageDeliveriesWindow as 2000ms and a lot of peers get penalized due to not being able to deliver enough messages in that time window, 400ms is just too tight to us considering the single thread nature in NodeJS - NodeJS suffers from I/O lag when the event loop is busy

@AgeManning
Copy link
Contributor Author

Yes I agree this one is tricky. We wanted to try and modify the scoring system to not use this parameter.
The point of the parameter is to prevent nodes from building up score just by replaying the messages back to us, which they can likely do in under a few 100ms. Having a very large value here I think defeats the purpose of the scoring param and we may need to look into alternative measures for handling this.

@twoeths
Copy link

twoeths commented Mar 21, 2022

Yes I agree this one is tricky. We wanted to try and modify the scoring system to not use this parameter.

👍 for this

The point of the parameter is to prevent nodes from building up score just by replaying the messages back to us, which they can likely do in under a few 100ms

I think in all implementations, once a duplicate message is seen by a peer, the scoring system will not increase mesh_message_deliveries of peer again no matter how many times that peer sends us that same message

@AgeManning
Copy link
Contributor Author

From the v1.1 Specs:

In order to compute P₃, the router maintains a counter that increments whenever a first or near-first message delivery occurs in the topic by a peer in the mesh. A near-first message delivery is a message delivery that occurs while a message has been first received and is being validated or it has been received within a configurable window of validation of first message delivery. The window is configurable but should be small (in the order of milliseconds) to avoid allowing a mesh peer to build score by simply replaying back the messages received by the current router. The parameter has a cap that applies at the time of increment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants