-
Notifications
You must be signed in to change notification settings - Fork 683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[store] Introduce ChainStoreUpdateAdapter and EpochStoreUpdateAdapter and use in epoch_sync #12866
base: master
Are you sure you want to change the base?
Conversation
… and use in epoch_sync
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12866 +/- ##
==========================================
+ Coverage 69.88% 70.36% +0.48%
==========================================
Files 853 853
Lines 173553 174098 +545
Branches 173553 174098 +545
==========================================
+ Hits 121281 122508 +1227
+ Misses 47091 46381 -710
- Partials 5181 5209 +28
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
||
update.merge(store_update); | ||
update.commit()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the change above looks like a simple rewrite, it is definitely NOT trivial!! There are couple of things happening here that are different.
We are no longer using chain_store_update but instead using the standard store_update.
Originally we relied on functions like save_block_header_no_update_tree
, force_save_header_head
and save_final_head
in chain store update along with update.commit()?
which internally calls update.finalize()
.
(!!) With the new code, we are directly going and making changes to the store layer instead of relying on the weird crazy behavior of update.finalize()
We also need to save the new blocks into store, i.e. call chain_store_update.commit()?;
before actually reading from the store right below that as we are no longer relying on the non-committed, temp to store values in update.get_block_header(...)
part of the code.
I had to print out all the DB ops from the old code and the new code and compare them to check if we are doing things properly. This is tested well manually.
One tiny thing to be noted is that as part of the finalize behavior in the old chain store update code, we were calling crate::state_sync::update_sync_hashes(...)
on all the blocks we were updating, but this isn't required as part of epoch sync. cc @marcelo-gonzalez.
One of those crazy things that's a byproduct of using the heavy hammer of chain store update... :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update_sync_hashes
Could you explain why? We likely want to sync state just after we finished epoch sync, so we need to make sure update_sync_hashes
is called for some blocks, I just don't understand which ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the original code in debug mode and we are calling this function below for last_header
fn on_new_epoch(store_update: &mut StoreUpdate, header: &BlockHeader) -> Result<(), Error> {
let num_new_chunks = vec![0u8; header.chunk_mask().len()];
store_update.set_ser(DBCol::StateSyncNewChunks, header.hash().as_ref(), &num_new_chunks)?;
Ok(())
}
Here, update_sync_hashes
only triggers at the epoch boundary, which happens in epoch sync at last_header
, i.e. proof.current_epoch.first_block_header_in_epoch
.
Here, the key we are adding is DBCol::StateSyncNewChunks
and the description says something like this
/// Stores a mapping from a block's hash to a list (indexed by ShardIndex like a header's chunk_mask())
/// of the number of new chunks up to that block after the first block in the epoch. Used in calculating
/// the right "sync_hash" for state sync after the StateSyncHashUpdate protocol feature is enabled.
/// - *Rows*: `CryptoHash`
/// - *Column type*: `Vec<u8>`
StateSyncNewChunks,
Makes me believe it's not super important and would eventually be updated with the next block processing as well with a different code path.
However, even after all this, in case further convincing is required, after epoch sync, we anyway do block sync for the current epoch so it anyway doesn't make sense to think about state_sync related things for that epoch. state_sync related things would get reset with the next epoch processing.
@marcelo-gonzalez could you please check if this argument makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to print out all the DB ops from the old code and the new code and compare them to check if we are doing things properly. This is tested well manually.
Just to reiterate, even though this code path is different, I manually tested this by looking into the actual store_update db ops and the only difference was one entry, that of DBCol::StateSyncNewChunks
that I described above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so in the previous code where we call save_block_header_no_update_tree on those headers in epoch sync, we first call it on proof.current_epoch.first_block_header_in_epoch
, and like the name says, it's the first block in an epoch. So update_sync_hashes()
will just set all zeros in the new chunks column for that block hash since it sees it's the first block in an epoch.
Then every other block header we're calling save_block_header_no_update_tree()
on is in the previous epoch, and none of the other headers in that epoch have been saved before all this. This has us return early from update_sync_hashes()
without doing anything bc of this check here. Note the comment about epoch sync there
So tldr, only proof.current_epoch.first_block_header_in_epoch
actually does something when we call update_sync_hashes()
and the others can safely be skipped. Actually there are several other comments in that file about epoch sync possibly meaning that some stuff has not been processed befeore. It would def simplify things if we could make this assumption: if update_sync_hashes()
is called on some header, then if its prev header in the same epoch, then it has already been called previously on that prev header. Right now if that isn't the case we just silently return and assume it's bc of epoch sync, but it would be good to be able to WARN in that case that somethng is wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand correctly, this PR does make a behavior change where update_sync_hashes()
will no longer be called on proof.current_epoch.first_block_header_in_epoch
, but it's ok because that block is not going to be in the current epoch, so we header sync until we get to the current epoch and then update_sync_hashes()
works as normal when we get to the new epoch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ Yes, that was my reasoning and wanted your confirmation!
self.store_update.set_ser(DBCol::BlockMisc, FINAL_HEAD_KEY, final_head).unwrap(); | ||
} | ||
|
||
/// This function is normally clubbed with set_block_header_only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then we can just unite both methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was thinking of doing that but thought of leaving the primitives. I had plans in the future to integrate functions like these into a helper module like, update_block
or similar, but thought of leaving them as is for now.
I can follow up in a different PR
|
||
update.merge(store_update); | ||
update.commit()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update_sync_hashes
Could you explain why? We likely want to sync state just after we finished epoch sync, so we need to make sure update_sync_hashes
is called for some blocks, I just don't understand which ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like save_block_header_no_update_tree
, force_save_header_head
and save_final_head
can be removed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the care to check everything @shreyan-gupta !
I love the direction this is going. When writing the epoch sync's apply_proof
code, I don't know what I am doing. I'm basically populating whatever field I find necessary, using whatever method I can find that would populate these fields, and verifying that the resulting DB state appears to be consistent (no crashes, chain progresses, can help bootstrap another epoch synced node, etc.). I wasn't even aware of the sync hash behavior - I'm very glad we're moving away from that kind of implicit behavior.
We have python-based integration tests that test epoch sync, but if you could try epoch syncing a mainnet node (it should be easy to do that now that epoch sync has been released) this would be good to go for sure.
Self { store_update: StoreUpdateHolder::Reference(store_update) } | ||
} | ||
|
||
/// USE THIS FUNCTION WITH CARE; proceed only if you know what you're doing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: down the road nobody is going to "know what they're doing". It will not be clear what "use with care" means. I suggest to just warn specifically what things can go wrong if you use it directly. (You already mentioned one, but maybe there are more and maybe what you mentioned can be explained a bit more.)
This PR helps make progress in a couple of things
...
We set up basis for work to simplify the chain store update... (how??)
This is because in my very long standing draft PR we try to establish that chain store update does not need to depend on "read your own write" philosophy.
Unfortunately the only place where this was causing a hindrance was in the epoch sync code due to some initialization complications and some unnecessary state sync code coming into play.
With the now simplified basic setup of store after epoch_sync, we no longer take the complex dependency on chain store and chain store update finalize function!!!