-
Notifications
You must be signed in to change notification settings - Fork 156
Load membership from storage in MembershipCoordinator before triggering catchup
#3789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3d22457 to
c1e0659
Compare
MembershipCoordinator before triggering catchup
ss-es
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial review on the hotshot side, I'll keep looking later
| pub async fn compute_drb_result<TYPES: NodeType>( | ||
| drb_input: DrbInput, | ||
| store_drb_progress: StoreDrbProgressFn, | ||
| load_drb_progress: LoadDrbProgressFn, | ||
| storage: Arc<dyn EpochStateStorage<TYPES>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not the biggest deal, but personally I think this is a downgrade in code structure -- this is the core DRB logic and now we need a full TestStorage implementation to test it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function is only used in epoch coordinator, and coordinator already has storage now so I don't see the point of having callbacks?
| #[async_trait] | ||
| pub trait Storage<TYPES: NodeType>: Send + Sync + Clone + 'static { | ||
| pub trait Storage<TYPES: NodeType>: | ||
| Send + Sync + Clone + 'static + EpochStateStorage<TYPES> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if Storage requires EpochStateStorage, why do we have a separate trait? why not just combine them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storage is not Object Safe. Passing storage to Coordinator would require a generic parameter and I wanted to avoid that
| &self, | ||
| epoch: TYPES::Epoch, | ||
| need_randomized: bool, | ||
| ) -> Result<bool> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not clear from the name + return type what the semantics of this function are. I think this could really use a comment explaining what it does and what it returns
also I really think Result<bool> is a bad return type, it's very hard to reason about. I think it would be better to make a custom enum or something here for the return type of insert_from_storage
| async fn insert_from_storage( | ||
| &self, | ||
| epoch: TYPES::Epoch, | ||
| need_randomized: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an enum would make help clarity here, e.g. NeedsRandomized and SkipRandomized
| if self | ||
| .insert_from_storage(epoch, false) | ||
| .await | ||
| .inspect_err(|e| { | ||
| let _ = warn!("loading from storage for epoch {epoch:?} failed: {e}"); | ||
| })? | ||
| { | ||
| return Ok(ret_val); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. if we error from storage we should continue, not return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also say it's not actually important that insert_from_storage is told whether we need_randomized. It should always query the DRB and load it if it exists. We can check the result afterwards, since has_randomized_stake_table() costs nothing to call.
imo, it would be clearer if:
insert_from_storageshould be calledcatchup_from_storage, returning()- after
self.catchup_from_storage, perform themembership.read().await.has_stake_tableorhas_randomized_stake_table()check again to decide whether to return early
| if self | ||
| .insert_from_storage(epoch, true) | ||
| .await | ||
| .inspect_err(|e| { | ||
| let _ = warn!("loading from storage for {epoch:?} failed: {e}"); | ||
| })? | ||
| { | ||
| return Ok(ret_val); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar comment here
Currently the
MembershipCoordinatorwill trigger a catchup request if the stake table is not present in memory. We do store the stake tables inadd_epoch_root().This PR adds fetching from storage if the stake table is not found in memory before triggering a catchup request. It adds a new traitEpochStateStoragein hotshot which contains all the storage methods related to membership including the DRB store/load. This allows us to remove alll the DRB callbacks from the MembershipCoordinator as well.This also removes
membership.reload_stake(RECENT_STAKE_TABLES_LIMIT).await;because we already reload stake tables when loading the consensus data at startup