Skip to content

Bootstrap sometimes does not find peers #5794

Open
@brooksprumo

Description

@brooksprumo

Originally posted in Discord here: https://discord.com/channels/428295358100013066/439194979856809985/1341074193428647956

Many times when I startup nodes from a blank slate (dev boxes and canaries), the node doesn't find any peers during bootstrap and timeouts/halts. This happens on v2.1, v2.2, and master. If I retry enough times it'll eventually work, but I'm not sure why I'm seeing this behavior. Has something related to bootstrap changed? Do we need a longer default timeout? Is peer discovery via gossip different? Thanks in advance for any insights!

And @gregcusack's response: https://discord.com/channels/428295358100013066/439194979856809985/1344426304346521650

Possible reason why we are seeing longer cluster joining times:
back in october we removed the following functionality: Say A wants to join the network and B is an entrypoint node:
Old code: A sends a PullRequest to B, B first inserts the ContactInfo of A into its crds table, and then sends back a PullResponse to A. This allowed node A to easily join the cluster. Not only would A receive a PullResponse with a bunch of cluster data, but also B would now have A's ContactInfo in their table, so A's ContactInfo would get sent out to a ton of nodes by B. We removed B inserting A's ContactInfo into its table upon receiving A's PullRequest in PR #3317: #3317

Problem with new code: A tries to join the network, sending a PullRequest to B. B will not insert A's ContactInfo into its table. B sends back PullResponse to A. However, if B is heavily loaded with PullRequests from other nodes, then B may hit its outbound data budget before sending back a PullResponse to A. As a result, A has to send another PullRequest to B and so on until B has enough budget to send a PullResponse back to A (with staked ContactInfos). We didn't see this issue before because B, regardless of its outbound budget getting maxed out, would insert A's ContactInfo into its crds table. B would just send out A's ContactInfo via PushMessage/PullResponse, resulting in other nodes pinging and reaching out to A.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions