Skip to content

reduce PullRequest intensity by ~2x #6141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 8, 2025

Conversation

alexpyattaev
Copy link

Problem

Context: #53
Gossip pull requests are inefficient and expensive Push should be the dominant propagation mechanism in gossip.

Summary of Changes

  • Increase PullRequest interval from 200ms to 500ms.
  • Make it tunable for further reduction in the future.

@alexpyattaev alexpyattaev requested a review from gregcusack May 7, 2025 14:17
@codecov-commenter
Copy link

codecov-commenter commented May 7, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.8%. Comparing base (79be8c9) to head (d562b30).
Report is 10 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master    #6141     +/-   ##
=========================================
- Coverage    82.9%    82.8%   -0.1%     
=========================================
  Files         842      842             
  Lines      377482   377485      +3     
=========================================
- Hits       312973   312816    -157     
- Misses      64509    64669    +160     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have a built in timer that runs the gossip loop every 100ms. can we use something like proposed here: solana-labs@a789973? can't really get 500ms but could get 400 or 600. i'm good with reducing this frequency but we really need to pay attention to propagation times once this change is running on testnet. to see if maybe push messages aren't propagating as well as we'd like

@alexpyattaev
Copy link
Author

we already have a built in timer that runs the gossip loop every 100ms. can we use something like proposed here: solana-labs@a789973? can't really get 500ms but could get 400 or 600.

That logic does not fit into existing code structure unfortunately. Also if we are delayed processing gossip for some reason we'd be issuing pulls every 500 ms rather than every 5 iterations. Also it can never be 400 ms, only 500 or 600 (both of which would be perfectly acceptable if this actually works).

i'm good with reducing this frequency but we really need to pay attention to propagation times once this change is running on testnet. to see if maybe push messages aren't propagating as well as we'd like

Totally agree, I've got this already running on a testnet node td3n5NGhP7JKWrL638gzau3NY7mF4K3ztZww3GkpywJ

@gregcusack
Copy link

That logic does not fit into existing code structure unfortunately.

why not? gossip loop runs at most once every 100ms. could just do:

 for round in 0u64.. {
...
(round % 3) == 0, // every 300ms or `%4` for every 400ms 
...
}

and then just remove the generate_pull_requests flag.

if we are delayed processing gossip for some reason we'd be issuing pulls every 500 ms rather than every 5 iterations

what do you mean by this? pull requests will be delayed anyway if this loop is slow to run.

@alexpyattaev
Copy link
Author

ok fair i can do the timing in "gossip rounds". in the end it does not really matter.

@alexpyattaev alexpyattaev requested a review from gregcusack May 7, 2025 20:19
gregcusack
gregcusack previously approved these changes May 8, 2025
Copy link

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for doing this. let's keep an eye metrics as this gets rolled out. maybe update title to say "reduce PullRequest intensity by ~2.5x" to make clear

@@ -1502,6 +1511,7 @@ impl ClusterInfo {
&sender,
generate_pull_requests,
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can remove this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d562b30

@@ -1494,6 +1500,9 @@ impl ClusterInfo {
.map(EpochSpecs::current_epoch_staked_nodes)
.cloned()
.unwrap_or_default();

// Make pull requests every PULL_REQUEST_PERIOD rounds
let generate_pull_requests = gossip_round % PULL_REQUEST_PERIOD == 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could just inline this into run_gossip() as gossip_round % PULL_REQUEST_PERIOD == 0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I started with it but this way its more clear what is going on. more LoC but easier to follow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fresh look, you are right it is better inline.

Copy link

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! let's keep an eye on metricsssss
thank you!

@alexpyattaev alexpyattaev merged commit 9c24787 into anza-xyz:master May 8, 2025
47 checks passed
@alexpyattaev alexpyattaev deleted the gossip_pr branch May 8, 2025 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants