[store] Introduce ChainStoreUpdateAdapter and EpochStoreUpdateAdapter and use in epoch_sync #12866

shreyan-gupta · 2025-02-03T20:54:01Z

This PR helps make progress in a couple of things

We introduce some basic set of functions in ChainStoreUpdateAdapter
We introduce the EpochStoreUpdateAdapter
We use these two adapters in the epoch_sync code
We set up basis for work to simplify the chain store update

...
We set up basis for work to simplify the chain store update... (how??)

This is because in my very long standing draft PR we try to establish that chain store update does not need to depend on "read your own write" philosophy.

Unfortunately the only place where this was causing a hindrance was in the epoch sync code due to some initialization complications and some unnecessary state sync code coming into play.

With the now simplified basic setup of store after epoch_sync, we no longer take the complex dependency on chain store and chain store update finalize function!!!

… and use in epoch_sync

codecov · 2025-02-03T21:12:09Z

Codecov Report

Attention: Patch coverage is 79.23077% with 27 lines in your changes missing coverage. Please review.

Project coverage is 70.36%. Comparing base (6e4b731) to head (f750de1).
Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
core/store/src/adapter/epoch_store.rs	34.37%	21 Missing ⚠️
chain/client/src/sync/epoch.rs	90.00%	0 Missing and 3 partials ⚠️
core/store/src/adapter/chain_store.rs	95.16%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #12866      +/-   ##
==========================================
+ Coverage   69.88%   70.36%   +0.48%     
==========================================
  Files         853      853              
  Lines      173553   174098     +545     
  Branches   173553   174098     +545     
==========================================
+ Hits       121281   122508    +1227     
+ Misses      47091    46381     -710     
- Partials     5181     5209      +28

Flag	Coverage Δ
backward-compatibility	`0.35% <0.00%> (?)`
db-migration	`0.35% <0.00%> (?)`
genesis-check	`1.41% <0.00%> (?)`
linux	`70.06% <79.23%> (+0.17%)`	⬆️
linux-nightly	`70.00% <79.23%> (?)`
pytests	`1.72% <0.00%> (?)`
sanity-checks	`1.52% <0.00%> (?)`
unittests	`70.19% <79.23%> (+0.31%)`	⬆️
upgradability	`0.35% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

shreyan-gupta · 2025-02-03T21:20:19Z

chain/client/src/sync/epoch.rs


-        update.merge(store_update);
-        update.commit()?;


While the change above looks like a simple rewrite, it is definitely NOT trivial!! There are couple of things happening here that are different.

We are no longer using chain_store_update but instead using the standard store_update.

Originally we relied on functions like save_block_header_no_update_tree, force_save_header_head and save_final_head in chain store update along with update.commit()? which internally calls update.finalize().

(!!) With the new code, we are directly going and making changes to the store layer instead of relying on the weird crazy behavior of update.finalize()

We also need to save the new blocks into store, i.e. call chain_store_update.commit()?; before actually reading from the store right below that as we are no longer relying on the non-committed, temp to store values in update.get_block_header(...) part of the code.

I had to print out all the DB ops from the old code and the new code and compare them to check if we are doing things properly. This is tested well manually.

One tiny thing to be noted is that as part of the finalize behavior in the old chain store update code, we were calling crate::state_sync::update_sync_hashes(...) on all the blocks we were updating, but this isn't required as part of epoch sync. cc @marcelo-gonzalez.

One of those crazy things that's a byproduct of using the heavy hammer of chain store update... :(

update_sync_hashes

Could you explain why? We likely want to sync state just after we finished epoch sync, so we need to make sure update_sync_hashes is called for some blocks, I just don't understand which ones.

I ran the original code in debug mode and we are calling this function below for last_header

fn on_new_epoch(store_update: &mut StoreUpdate, header: &BlockHeader) -> Result<(), Error> { let num_new_chunks = vec![0u8; header.chunk_mask().len()]; store_update.set_ser(DBCol::StateSyncNewChunks, header.hash().as_ref(), &num_new_chunks)?; Ok(()) }

Here, update_sync_hashes only triggers at the epoch boundary, which happens in epoch sync at last_header, i.e. proof.current_epoch.first_block_header_in_epoch.

Here, the key we are adding is DBCol::StateSyncNewChunks and the description says something like this

/// Stores a mapping from a block's hash to a list (indexed by ShardIndex like a header's chunk_mask()) /// of the number of new chunks up to that block after the first block in the epoch. Used in calculating /// the right "sync_hash" for state sync after the StateSyncHashUpdate protocol feature is enabled. /// - *Rows*: `CryptoHash` /// - *Column type*: `Vec<u8>` StateSyncNewChunks,

Makes me believe it's not super important and would eventually be updated with the next block processing as well with a different code path.

However, even after all this, in case further convincing is required, after epoch sync, we anyway do block sync for the current epoch so it anyway doesn't make sense to think about state_sync related things for that epoch. state_sync related things would get reset with the next epoch processing.

@marcelo-gonzalez could you please check if this argument makes sense?

I had to print out all the DB ops from the old code and the new code and compare them to check if we are doing things properly. This is tested well manually.

Just to reiterate, even though this code path is different, I manually tested this by looking into the actual store_update db ops and the only difference was one entry, that of DBCol::StateSyncNewChunks that I described above.

Yeah so in the previous code where we call save_block_header_no_update_tree on those headers in epoch sync, we first call it on proof.current_epoch.first_block_header_in_epoch, and like the name says, it's the first block in an epoch. So update_sync_hashes() will just set all zeros in the new chunks column for that block hash since it sees it's the first block in an epoch.

Then every other block header we're calling save_block_header_no_update_tree() on is in the previous epoch, and none of the other headers in that epoch have been saved before all this. This has us return early from update_sync_hashes() without doing anything bc of this check here. Note the comment about epoch sync there

So tldr, only proof.current_epoch.first_block_header_in_epoch actually does something when we call update_sync_hashes() and the others can safely be skipped. Actually there are several other comments in that file about epoch sync possibly meaning that some stuff has not been processed befeore. It would def simplify things if we could make this assumption: if update_sync_hashes() is called on some header, then if its prev header in the same epoch, then it has already been called previously on that prev header. Right now if that isn't the case we just silently return and assume it's bc of epoch sync, but it would be good to be able to WARN in that case that somethng is wrong

So if I understand correctly, this PR does make a behavior change where update_sync_hashes() will no longer be called on proof.current_epoch.first_block_header_in_epoch, but it's ok because that block is not going to be in the current epoch, so we header sync until we get to the current epoch and then update_sync_hashes() works as normal when we get to the new epoch

^ Yes, that was my reasoning and wanted your confirmation!

Longarithm · 2025-02-04T06:33:43Z

core/store/src/adapter/chain_store.rs

+        self.store_update.set_ser(DBCol::BlockMisc, FINAL_HEAD_KEY, final_head).unwrap();
+    }
+
+    /// This function is normally clubbed with set_block_header_only


But then we can just unite both methods.

Yes, I was thinking of doing that but thought of leaving the primitives. I had plans in the future to integrate functions like these into a helper module like, update_block or similar, but thought of leaving them as is for now.

I can follow up in a different PR

Longarithm · 2025-02-04T06:38:12Z

chain/client/src/sync/epoch.rs


-        update.merge(store_update);
-        update.commit()?;


update_sync_hashes

Could you explain why? We likely want to sync state just after we finished epoch sync, so we need to make sure update_sync_hashes is called for some blocks, I just don't understand which ones.

staffik

Looks like save_block_header_no_update_tree, force_save_header_head and save_final_head can be removed now.

robin-near

Thanks for taking the care to check everything @shreyan-gupta !

I love the direction this is going. When writing the epoch sync's apply_proof code, I don't know what I am doing. I'm basically populating whatever field I find necessary, using whatever method I can find that would populate these fields, and verifying that the resulting DB state appears to be consistent (no crashes, chain progresses, can help bootstrap another epoch synced node, etc.). I wasn't even aware of the sync hash behavior - I'm very glad we're moving away from that kind of implicit behavior.

We have python-based integration tests that test epoch sync, but if you could try epoch syncing a mainnet node (it should be easy to do that now that epoch sync has been released) this would be good to go for sure.

robin-near · 2025-02-07T00:15:29Z

core/store/src/adapter/chain_store.rs

+        Self { store_update: StoreUpdateHolder::Reference(store_update) }
+    }
+
+    /// USE THIS FUNCTION WITH CARE; proceed only if you know what you're doing.


nit: down the road nobody is going to "know what they're doing". It will not be clear what "use with care" means. I suggest to just warn specifically what things can go wrong if you use it directly. (You already mentioned one, but maybe there are more and maybe what you mentioned can be explained a bit more.)

…pdate

… and use in epoch_sync (near#12866) This PR helps make progress in a couple of things - We introduce some basic set of functions in ChainStoreUpdateAdapter - We introduce the EpochStoreUpdateAdapter - We use these two adapters in the epoch_sync code - We set up basis for work to simplify the chain store update ... We set up basis for work to simplify the chain store update... (how??) This is because in my very long standing [draft PR](near#12162) we try to establish that chain store update does not need to depend on "read your own write" philosophy. Unfortunately the only place where this was causing a hindrance was in the epoch sync code due to some initialization complications and some unnecessary state sync code coming into play. With the now simplified basic setup of store after epoch_sync, we no longer take the complex dependency on chain store and chain store update finalize function!!!

[store] Introduce ChainStoreUpdateAdapter and EpochStoreUpdateAdapter…

f750de1

… and use in epoch_sync

shreyan-gupta requested review from wacban, Longarithm, staffik and robin-near February 3, 2025 20:54

shreyan-gupta requested a review from a team as a code owner February 3, 2025 20:54

shreyan-gupta commented Feb 3, 2025

View reviewed changes

Longarithm reviewed Feb 4, 2025

View reviewed changes

staffik approved these changes Feb 5, 2025

View reviewed changes

remove unused functions

a062230

shreyan-gupta requested a review from Longarithm February 5, 2025 18:32

robin-near approved these changes Feb 7, 2025

View reviewed changes

shreyan-gupta added 2 commits February 26, 2025 16:48

Merge remote-tracking branch 'origin' into shreyan/store/epoch_sync_u…

8a10ebb

…pdate

better comments

7efff61

shreyan-gupta enabled auto-merge February 27, 2025 00:56

shreyan-gupta added this pull request to the merge queue Feb 27, 2025

Merged via the queue into master with commit 1862e50 Feb 27, 2025
24 checks passed

shreyan-gupta deleted the shreyan/store/epoch_sync_update branch February 27, 2025 01:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[store] Introduce ChainStoreUpdateAdapter and EpochStoreUpdateAdapter and use in epoch_sync #12866

[store] Introduce ChainStoreUpdateAdapter and EpochStoreUpdateAdapter and use in epoch_sync #12866

Uh oh!

shreyan-gupta commented Feb 3, 2025

Uh oh!

codecov bot commented Feb 3, 2025

Uh oh!

shreyan-gupta Feb 3, 2025

Uh oh!

Longarithm Feb 4, 2025

Uh oh!

shreyan-gupta Feb 4, 2025

Uh oh!

shreyan-gupta Feb 4, 2025

Uh oh!

marcelo-gonzalez Feb 4, 2025 •

edited

Loading

Uh oh!

marcelo-gonzalez Feb 4, 2025

Uh oh!

shreyan-gupta Feb 5, 2025

Uh oh!

Longarithm Feb 4, 2025

Uh oh!

shreyan-gupta Feb 4, 2025

Uh oh!

Longarithm Feb 4, 2025

Uh oh!

staffik left a comment

Uh oh!

robin-near left a comment

Uh oh!

robin-near Feb 7, 2025

Uh oh!

Uh oh!

Uh oh!

[store] Introduce ChainStoreUpdateAdapter and EpochStoreUpdateAdapter and use in epoch_sync #12866

[store] Introduce ChainStoreUpdateAdapter and EpochStoreUpdateAdapter and use in epoch_sync #12866

Uh oh!

Conversation

shreyan-gupta commented Feb 3, 2025

Uh oh!

codecov bot commented Feb 3, 2025

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marcelo-gonzalez Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

staffik left a comment

Choose a reason for hiding this comment

Uh oh!

robin-near left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

marcelo-gonzalez Feb 4, 2025 •

edited

Loading