Skip to content

floresta-chain: new optimized chainstore #251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 15, 2025

Conversation

Davidson-Souza
Copy link
Member

The current chainstore is based on kv, but it has a few problems:

  • When we flush, we get a huge heap spike
  • We are getting a 2 or 3 times overhead on headers
  • It gets kinda slow to retrieve headers during IBD if we flush early

This commit introduces a bare-bones, ad-hock store that consists in two parts:

  • A open addressing, file backed and memory-mapped hash map to keep the relation block_hash -> block_height
  • A flat file that contains block headers serialized, in ascending order

To recover a header, given the block height, we simply use pointer arithmetic inside the flat file. If we need to get from the block hash, use the map first, then find it inside the flat file. This has the advantage of not needing explicit flushes (the os will flush it in fixed intervals), flushes are async (the os will do it), we get caching for free (mmap-ed pages will stay in memory if we need) and our cache can react to system constraints, because the kernel will always know how much memory we sill have

@JoseSK999
Copy link
Contributor

My understanding is that kv buckets also don't need explicit flushes. The data saved in a bucket is flushed to disk by the OS periodically, and it's kept in the inner sled pagecache (so we should have fast access), which is configured with a capacity of 100MB in KvChainStore::new.

@Davidson-Souza
Copy link
Member Author

That was my understanding too, but for some reason, even before using the cache (second part of #169) it didn't really flush on its own, and if we had an unclean shutdown, we would lose our progress (or a good part of it). It would also become increasingly more CPU and IO heavy as we made progress, I suspect it's due to the block locator getting bigger as our chain grows.

But the biggest problem for me, that I couldn't find an alternative, was the heap spike. It would always crash on my phone, before or after #169. With this PR, it runs fine!

@JoseSK999
Copy link
Contributor

JoseSK999 commented Oct 8, 2024

It would also become increasingly more CPU and IO heavy as we made progress, I suspect it's due to the block locator getting bigger as our chain grows.

Isn't that the expected behavior of a node in IBD, as it moves from old empty blocks to more recent ones?

Also was the OS not flushing on its own on desktop or on mobile?

@Davidson-Souza
Copy link
Member Author

Isn't that the expected behavior of a node in IBD, as it moves from old empty blocks to more recent ones?

Not this much, at least not for headers. They are small and have constant-size

Also was the OS not flushing on its own on desktop or on mobile?

Both. At least on my setup.

@Davidson-Souza Davidson-Souza force-pushed the new-chainstore branch 7 times, most recently from 4f43d68 to 9961e8c Compare January 2, 2025 22:05
@Davidson-Souza Davidson-Souza changed the title [WIP] floresta-chain: new optimized chainstore floresta-chain: new optimized chainstore Jan 3, 2025
@Davidson-Souza Davidson-Souza marked this pull request as ready for review January 3, 2025 00:26
Copy link
Contributor

@jaoleal jaoleal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes, heres mine superficial review... I still didnt finished.

Youre right, this needs a lot of testing and review.
It looks a nice job!

@Davidson-Souza Davidson-Souza force-pushed the new-chainstore branch 2 times, most recently from 9f33428 to 66f70e6 Compare February 11, 2025 16:19
Copy link
Contributor

@JoseSK999 JoseSK999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial review, looks good so far! I still have to check flat_chain_store.rs

Copy link
Contributor

@JoseSK999 JoseSK999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly nits, but I think I have found an issue with the update_block_index method impl, as explained below.

@Davidson-Souza Davidson-Souza force-pushed the new-chainstore branch 3 times, most recently from 5175e30 to 0597417 Compare March 5, 2025 17:54
@lucad70
Copy link
Contributor

lucad70 commented Mar 21, 2025

I have started a review aiming to measure the heap usage on both master and on this PR. However the tool I am using stops recording as soon as I give a kill sig to the florestad (I am using mac's Instruments for reference access this link).

This is a snapshot showing both builds (this PR and Master).
pr_analysis_start_of_node

I wanted to give a more detailed info on shutdown, as I figure out how to do it, there goes a few reviewing points:

  • On both builds there is a spike at the start up (I'll also look into it to figure out why)
  • This PR also improves heap usage overall during startup and regular process. Check the total bytes column to see the decrease from 5.59GiB to 5.02GiB on heap allocations, roughly 10% improvement overall on startup and regular usage.

@Davidson-Souza
Copy link
Member Author

I have started a review aiming to measure the heap usage on both master and on this PR. However the tool I am using stops recording as soon as I give a kill sig to the florestad (I am using mac's Instruments for reference access this link).

This is a snapshot showing both builds (this PR and Master).
pr_analysis_start_of_node

I wanted to give a more detailed info on shutdown, as I figure out how to do it, there goes a few reviewing points:

  • On both builds there is a spike at the start up (I'll also look into it to figure out why)
  • This PR also improves heap usage overall during startup and regular process. Check the total bytes column to see the decrease from 5.59GiB to 5.02GiB on heap allocations, roughly 10% improvement overall on startup and regular usage.

Hey @lucad70 . Are you building floresta with the --experimental-db feature?

@lucad70
Copy link
Contributor

lucad70 commented Mar 24, 2025

Hey @lucad70 . Are you building floresta with the --experimental-db feature?

I am not. Should I be building it using the --experimental-db? Maybe I missed it while reading the documentation.

@lucad70
Copy link
Contributor

lucad70 commented Mar 24, 2025

Hey @lucad70 . Are you building floresta with the --experimental-db feature?

I am not. Should I be building it using the --experimental-db? Maybe I missed it while reading the documentation.

Okay, I have re-read here and I am now testing on my machine. I am also fixing a few mistakes I made while profiling.

  • I used the debug version, I should use release instead
  • I did not used the experimental-db
  • I did only one run for each attempt
  • I did not make sure my computer had no other program running

I am now addressing these issues and will return with more helpful data asap

Copy link
Contributor

@JoseSK999 JoseSK999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few additional suggestions. I still have to review more deeply some things

Copy link
Contributor

@JoseSK999 JoseSK999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of other suggestions, I'm closer to a final review

@Davidson-Souza
Copy link
Member Author

Updated with this diff

diff --git a/crates/floresta-chain/src/pruned_utreexo/flat_chain_store.rs b/crates/floresta-chain/src/pruned_utreexo/flat_chain_store.rs
index be41a88..d7ec29c 100644
--- a/crates/floresta-chain/src/pruned_utreexo/flat_chain_store.rs
+++ b/crates/floresta-chain/src/pruned_utreexo/flat_chain_store.rs
@@ -22,7 +22,7 @@
 //!
 //! We want to keep the load factor for the hash map as low as possible, while also avoiding
 //! re-hashing. So we pick a fairly big initial space to it, say 10 million buckets. Each bucket is
-//! 4 bytes long, so we have 40 MiB of map. Each [DiskBlockHeader] is 116 bytes long (80 bytes for
+//! 4 bytes long, so we have 40 MiB of map. Each [DiskBlockHeaderAndHash] is 116 bytes long (80 bytes for
 //! header + 4 for height + 32 for hash), so the maximum size for it, assuming 2.5 million headers
 //! (good for the next ~30 years), is 330 MiB. The smallest device I know that can run floresta
 //! has ~250 MiB of RAM, so we could  fit almost everything in memory. Given that newer blocks are
@@ -99,7 +99,6 @@ const UTREEXO_ACC_SIZE: usize = 32 * 64 + 8;
 
 #[derive(Clone)]
 /// Configuration for our flat chain store. See each field for more information
-
 pub struct FlatChainStoreConfig {
     /// The index map size, in buckets
     ///
@@ -137,7 +136,7 @@ pub struct FlatChainStoreConfig {
     /// The size of the fork headers file map, in headers
     ///
     /// This store keeps headers that are not in our main chain, but may be needed sometime. The
-    /// default value is having space for 1000 blocks.
+    /// default value is having space for 10,000 blocks.
     ///
     /// When computing the size in bytes, we will round the number to the nearest power of 2, minus
     /// 1. This lets us do some optimizations like use & instead of %, and use << instead of *.
@@ -346,7 +345,7 @@ pub struct FlatChainStore {
 /// and easy to read. It holds the memory map for the index, and the size of the index
 /// in buckets. It also keeps track of how many buckets are occupied, so we can re-hash
 /// the map when it gets full.
-pub struct BlockIndex {
+struct BlockIndex {
     /// The actual memory map for the index
     index_map: MmapMut,
 
@@ -367,7 +366,7 @@ impl BlockIndex {
     /// This function should only be called by [FlatChainStore::new], and it should never be called
     /// directly. It creates a new block index, given a memory map for the index, the size of the
     /// index in buckets, and how many buckets are occupied.
-    pub(super) fn new(index_map: MmapMut, index_size: usize, occupied: usize) -> Self {
+    fn new(index_map: MmapMut, index_size: usize, occupied: usize) -> Self {
         Self {
             index_map,
             index_size,
@@ -380,7 +379,7 @@ impl BlockIndex {
     /// If we have enough changes that we don't want to lose, we should flush the index map to
     /// disk. This will make sure that the index map is persisted, and we can recover it in case
     /// of a crash.
-    pub(super) fn flush(&self) -> Result<(), FlatChainstoreError> {
+    fn flush(&self) -> Result<(), FlatChainstoreError> {
         self.index_map.flush()?;
 
         Ok(())
@@ -732,7 +731,7 @@ impl FlatChainStore {
             return Err(FlatChainstoreError::NotFound);
         }
 
-        Ok(metadata.acc[0..size].iter().copied().collect())
+        Ok(metadata.acc[0..size].to_vec())
     }
 
     unsafe fn do_save_height(&self, best_block: BestChain) -> Result<(), FlatChainstoreError> {
@@ -868,7 +867,6 @@ impl FlatChainStore {
     }
 
     #[inline(always)]
-    #[must_use]
     #[doc(hidden)]
     fn get_cache_mut(&self) -> Result<MutexGuard<CacheType>, PoisonError<MutexGuard<CacheType>>> {
         self.cache.lock()
diff --git a/florestad/src/florestad.rs b/florestad/src/florestad.rs
index 0d5748c..6a1edfa 100644
--- a/florestad/src/florestad.rs
+++ b/florestad/src/florestad.rs
@@ -706,7 +706,7 @@ impl Florestad {
     #[cfg(feature = "experimental-db")]
     fn load_chain_store(data_dir: String) -> ChainStore {
         let config = FlatChainStoreConfig::new(data_dir + "/chaindata");
-        ChainStore::new(config.clone()).expect("failure while creating chainstate")
+        ChainStore::new(config).expect("failure while creating chainstate")
     }
 
     #[cfg(not(feature = "experimental-db"))]
@@ -715,7 +715,7 @@ impl Florestad {
         network: Network,
         assume_valid: Option<bitcoin::BlockHash>,
     ) -> ChainState<ChainStore<'static>> {
-        let db = ChainStore::new(data_dir.to_string()).expect("Could not read db");
+        let db = ChainStore::new(data_dir.clone()).expect("Could not read db");
         let assume_valid =
             assume_valid.map_or(AssumeValidArg::Hardcoded, AssumeValidArg::UserInput);
 
@@ -723,7 +723,7 @@ impl Florestad {
             Ok(chainstate) => chainstate,
             Err(err) => match err {
                 BlockchainError::ChainNotInitialized => {
-                    let db = ChainStore::new(data_dir.to_string()).expect("Could not read db");
+                    let db = ChainStore::new(data_dir).expect("Could not read db");
 
                     ChainState::<ChainStore>::new(db, network.into(), assume_valid)
                 }
@@ -746,7 +746,7 @@ impl Florestad {
             Ok(chainstate) => chainstate,
             Err(err) => match err {
                 BlockchainError::ChainNotInitialized => {
-                    let db = Self::load_chain_store(data_dir.clone());
+                    let db = Self::load_chain_store(data_dir);
 
                     ChainState::<ChainStore>::new(db, network.into(), assume_valid)
                 }

Copy link
Contributor

@JoseSK999 JoseSK999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly changes for readability and clarity, and some suggested comments that I believe are important. My next review should be the last one.

Copy link
Contributor

@JoseSK999 JoseSK999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update, persist and correctly load the BlockIndex occupied count.

@Davidson-Souza
Copy link
Member Author

Davidson-Souza commented Apr 17, 2025

@JoseSK999 I think the problem here is the API for ChainStore. It should mark methods like save_header as taking a mutable reference of self. Otherwise we end up in a situation where users can always call them in a thread unsafe way. If we want to expose those types, and allow users to implement their own chainstate, we must make sure this is not possible. For this PR tho, I think we can just leave as is for now.

For the map size, I'm using the fields inside metadata as you suggested. I'm also initializing the Metadata struct when we create a new chainstore

@Davidson-Souza
Copy link
Member Author

Rebased

@JoseSK999
Copy link
Contributor

@Davidson-Souza I see what you mean. Even if we don't unsafe impl Sync, FlatChainStore is already Sync since the MmapMut type is Sync. So currently we can have multiple threads with references to the store and do concurrent writes and reads (scary). Changing the signature for the mutating methods seems the proper fix (for another PR).

Copy link
Contributor

@JoseSK999 JoseSK999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final comments!

@Davidson-Souza Davidson-Souza force-pushed the new-chainstore branch 2 times, most recently from 48a620d to ff692d2 Compare May 13, 2025 20:42
@Davidson-Souza
Copy link
Member Author

Rebased and updated with @JoseSK999's comments.

Copy link
Contributor

@JoseSK999 JoseSK999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some remaining things and we are done

@Davidson-Souza Davidson-Souza force-pushed the new-chainstore branch 2 times, most recently from 2ed1330 to 7022b15 Compare May 15, 2025 15:28
@Davidson-Souza
Copy link
Member Author

@JoseSK999 done!

The current chainstore is based on `kv`, but it has a few problems:
  - When we flush, we get a huge heap spike
  - We are getting a 2 or 3 times overhead on headers
  - It gets kinda slow to retrieve headers during IBD if we flush early

This commit introduces a bare-bones, ad-hock store that consists in two
parts:
  - A open addressing, file backed and memory-mapped hash map to keep
    the relation block_hash -> block_height
  - A flat file that contains block headers serialized, in ascending
    order
  - A LRU cache to avoid going througth the map every time

To recover a header, given the block height, we simply use pointer
arithmetic inside the flat file. If we need to get from the block hash,
use the map first, then find it inside the flat file. This has the
advantage of not needing explicit flushes (the os will flush it in fixed
intervals), flushes are async (the os will do it), we get caching for
free (mmap-ed pages will stay in memory if we need) and our cache can
react to system constraints, because the kernel will always know how
much memory we sill have
Copy link
Contributor

@JoseSK999 JoseSK999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 40baf4b 🥳

@Davidson-Souza Davidson-Souza merged commit 46b5830 into vinteumorg:master May 15, 2025
10 checks passed
@Davidson-Souza
Copy link
Member Author

Thanks @JoseSK999, @lucad70 and @jaoleal for the valuable review 🫡

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants