-
Notifications
You must be signed in to change notification settings - Fork 453
Finally introduce sane unified scheduler shutdown #5866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@@ -644,6 +651,12 @@ impl ForkGraph for BankForks { | |||
} | |||
} | |||
|
|||
impl Drop for BankForks { | |||
fn drop(&mut self) { | |||
error!("BankForks::drop(): successfully dropped"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self: now that, ProgramCache's BankForks circular-referenc is gone. maybe i can wire prepare_to_drop
from here to exercise the shutdown code-path (esp in tests!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is done: 163015a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit biased toward my solution in #5826 because I think have the strong circular references adds quite a bit of room for us to introduce a similar bug in the future whereas weak references make that harder to do.
However I think most of these changes make sense in addition to my PR, and give us stronger confidence in shutdown logic. AFAICT these two PRs are not conflicting, please let me know if you see a reason why this would not be the case.
drop(self.cluster_info); | ||
|
||
self.poh_service.join().expect("poh_service"); | ||
macro_rules! join_then_log { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a separate change to assist in debugging thread-joining issues. Probably a good change to make, but not related to fixing this issue afiact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. another piggy-bucked salvation from the ancient pr #593 by procrastinating me.
runtime/src/bank_forks.rs
Outdated
@@ -259,6 +259,13 @@ impl BankForks { | |||
bank | |||
} | |||
|
|||
pub fn prepare_to_drop(&mut self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just do a custom drop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is done: #5866 (comment)
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5866 +/- ##
=======================================
Coverage 83.0% 83.0%
=======================================
Files 828 828
Lines 375691 375715 +24
=======================================
+ Hits 311874 311921 +47
+ Misses 63817 63794 -23 🚀 New features to boost your workflow:
|
First of all, thanks for working on unified scheduler code. To be clear, i have a strong preference regarding strong circular references and relaxing it by weak ones. but I'm not planning to push my idea to the team so hard. So, if i cannot change your mind with this comment, I'm fine to merge #5826. it's written after good understanding of unified scheduler and it's indeed a good solution. I'm quite happy to see these kinds of prs from someone other than me.. :) And it reduces my work load for now. however, I might revisit it in the future, as i said at bottom of this wall of text (due to real perf reasons)... xD Maybe, the difference of approaches in the two prs boils down to different coding stances. I agree that a strong circular references should be avoided if possible and that it's error-prone. However, at the same time, i think this downside can be under control with strict coding of full of Fyi, this bug is known (at least by me!) since start, and i still don't think local-cluster test (#5435) and production On the other hand, I agree weak circular references does indeed fix the pressing problem (if any!). However, i think it clouds system responsibilities. Namely, guarantee of Lastly, there's a rationale for strong circular references other than introduction of strict runtime invariant: performance. For
Yeah, we can mix the two technically. however, i think merging either one should be enough. I think we should stick to one philosophy and be consistent with it. |
Problem
I was lazy. There was wip shutdown impl buried in an ancient pr..
Summary of Changes
An alternative approach to #5826
fyi: salvaged from #593