Skip to content

history-expiry: RocksDB BlobDB GC tuning CLI Config #8599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 13, 2025

Conversation

siladu
Copy link
Contributor

@siladu siladu commented May 8, 2025

This PR introduces three new options:

  • --Xplugin-rocksdb-blockchain-blob-garbage-collection-enabled is a custom Besu option that enables GC for the BLOCKCHAIN column family, which leads to some space amplification even before performing any pruning.
  • --Xplugin-rocksdb-blob-garbage-collection-age-cutoff configures blob_garbage_collection_age_cutoff RocksDB option
  • --Xplugin-rocksdb-blob-garbage-collection-force-threshold configures blob_garbage_collection_force_threshold RocksDB option

RocksDB options will apply to all column families with BlobDB and GC enabled, currently just TRIE_LOG_STORAGE and if configured, BLOCKCHAIN.

If non-default settings are used, these are displayed in the config overview:

# Experimental BlobDB BLOCKCHAIN Garbage Collection enabled                                        #
# Experimental BlobDB GC age cutoff: 0.5; force threshold: 0.1                                   #

From https://github.com/facebook/rocksdb/wiki/BlobDB#column-family-options:

  • blob_garbage_collection_age_cutoff: the cutoff that the GC logic uses to determine which blob files should be considered “old.” For example, the default value of 0.25 signals to RocksDB that blobs residing in the oldest 25% of blob files should be relocated by GC. This parameter can be tuned to adjust the trade-off between write amplification and space amplification.
  • blob_garbage_collection_force_threshold: if the ratio of garbage in the oldest blob files exceeds this threshold, targeted compactions are scheduled in order to force garbage collecting the blob files in question, assuming they are all eligible based on the value of blob_garbage_collection_age_cutoff above. This can help reduce space amplification in the case of skewed workloads where the affected files would not otherwise be picked up for compaction. This option is currently only supported with leveled compactions.

The idea is to perform pruning with the storage prune-premerge-blocks subcommand and then temporarily enable these three options with recommended setting in order to get Besu to reclaim the space, but then disable them afterwards, since BLOCKCHAIN GC is not required once the pruning space is reclaimed:

--Xplugin-rocksdb-blockchain-blob-garbage-collection-enabled=true
--Xplugin-rocksdb-blob-garbage-collection-age-cutoff=0.5
--Xplugin-rocksdb-blob-garbage-collection-force-threshold=0.1

Next step is to bundle these settings together in a convenience option e.g. --history-expiry-prune


Testing

See #8599 (comment)

TODO

This PR can merge as is, but to confirm the best settings would like to do more testing with "less aggressive but all files" setting:

blob_garbage_collection_age_cutoff = 1
blob_garbage_collection_force_threshold = 0.1

and the maybe the settings from #8480

Next step is to bundle these settings together in a convenience option e.g. --history-expiry-prune

siladu added 9 commits May 8, 2025 15:45
--Xplugin-rocksdb-blob-garbage-collection-age-cutoff configures blob_garbage_collection_age_cutoff
--Xplugin-rocksdb-blob-garbage-collection-force-threshold configures blob_garbage_collection_force_threshold

Signed-off-by: Simon Dudley <[email protected]>
…onfigure BLOCKCHAIN BlobDB GC specifically

Signed-off-by: Simon Dudley <[email protected]>
Rather than hardcoding RocksDB defaults in Besu

Signed-off-by: Simon Dudley <[email protected]>
Signed-off-by: Simon Dudley <[email protected]>
Signed-off-by: Simon Dudley <[email protected]>
Signed-off-by: Simon Dudley <[email protected]>
Signed-off-by: Simon Dudley <[email protected]>
@siladu
Copy link
Contributor Author

siladu commented May 9, 2025

Testing

Three configurations were tested:

Description Settings
RocksDB Defaults blob_garbage_collection_age_cutoff = 0.25
blob_garbage_collection_force_threshold = 1
Aggressive GC blob_garbage_collection_age_cutoff = 1.0
blob_garbage_collection_force_threshold = 0.01
Middle ground blob_garbage_collection_age_cutoff = 0.5
blob_garbage_collection_force_threshold = 0.1

The "Middle ground" proved to be the most effective, balancing a large amount of reclaimed space with not too much initial space amplification and took the shortest time ~8 hours to reclaim space.

"RocksDB Defaults" didn't reduce the disk space overall.
"Aggressive GC" increased space amplification a lot and took ~12 hours to reclaim space, though reclaimed slightly more than middle ground.

Results

How the different settings impact the disk usage following the prune and if left enabled, what the end state is.

Node Initial Max Size After GC BLOCKCHAIN End state (days after pruning)
RocksDB Defaults - - - -
dev-elc-bu-lh-mainnet-simon-4444-prune-fast-gc-defaults 1.42 TiB 1.44 TiB n/a Blob file count: 34358, total size: 896.6 GB, garbage size: 357.9 GB, space amp: 1.7
dev-elc-bu-tk-mainnet-simon-4444-prune-fast-gc-defaults 1.45 TiB 1.48 TiB n/a Blob file count: 34356, total size: 896.7 GB, garbage size: 358.0 GB, space amp: 1.7
Aggressive GC - - - -
dev-elc-bu-lh-mainnet-simon-4444-prune-fast-gc-cutoff1-force001 1.61 TiB 1.93 TiB 1.08 TiB Blob file count: 2183, total size: 539.1 GB, garbage size: 0.4 GB, space amp: 1.0
dev-elc-bu-tk-mainnet-simon-4444-prune-fast-gc-cutoff1-force001* 1.70 TiB 1.94 TiB n/a* Blob file count: 34285, total size: 1379.1 GB, garbage size: 806.9 GB, space amp: 2.4*
Middle ground - - - -
dev-elc-bu-lh-mainnet-simon-4444-prune-fast-gc-cutoff05-force01 1.42 TiB 1.54 TiB 1.15 TiB Blob file count: 17683, total size: 599.3 GB, garbage size: 3.6 GB, space amp: 1.0
dev-elc-bu-tk-mainnet-simon-4444-prune-fast-gc-cutoff05-force01 1.45 TiB 1.56 TiB 1.14 TiB Blob file count: 17527, total size: 579.9 GB, garbage size: 41.3 GB, space amp: 1.1

Notes:

  • Nodes were launched from a snapshot.
  • Total size includes EL + CL data
  • RocksDB Default settings didn't result in a disk usage drop, possibly because force_threshold = 1 requires a whole blob file to be full of garbage?
  • The presence of the BlobDB GC settings had an impact even before the prune subcommand was run, which is why the Aggressive GC has a higher initial value, actually initial value for all before GC and prune is closer to 1.4 TiB
  • End state is pretty stable, I assume the "garbage size" in the log is identified but not eligible for removal due to the settings.
  • Something unrelated went wrong with dev-elc-bu-tk-mainnet-simon-4444-prune-fast-gc-cutoff1-force001 so it's not a valid result

RocksDB Defaults

dev-elc-bu-lh-mainnet-simon-4444-prune-fast-gc-defaults
Screenshot 2025-05-09 at 2 42 09 pm

dev-elc-bu-tk-mainnet-simon-4444-prune-fast-gc-defaults
Screenshot 2025-05-09 at 2 43 11 pm


Aggressive GC

dev-elc-bu-lh-mainnet-simon-4444-prune-fast-gc-cutoff1-force001
Screenshot 2025-05-09 at 2 41 54 pm

dev-elc-bu-tk-mainnet-simon-4444-prune-fast-gc-cutoff1-force001
Screenshot 2025-05-09 at 2 42 56 pm


Middle ground

dev-elc-bu-lh-mainnet-simon-4444-prune-fast-gc-cutoff05-force01
Screenshot 2025-05-09 at 2 41 20 pm

dev-elc-bu-tk-mainnet-simon-4444-prune-fast-gc-cutoff05-force01
Screenshot 2025-05-09 at 2 42 31 pm

@siladu siladu marked this pull request as ready for review May 9, 2025 05:40
@siladu siladu changed the title RocksDB BlobDB GC tuning CLI Config history-expiry: RocksDB BlobDB GC tuning CLI Config May 9, 2025

/**
* The Blob garbage collection age cutoff. The fraction of file age to be considered eligible for
* GC; e.g. 0.25 = oldest 25% of files eligible; e.g. 1 = all files eligible When unspecified, use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* GC; e.g. 0.25 = oldest 25% of files eligible; e.g. 1 = all files eligible When unspecified, use
* GC; e.g. 0.25 = oldest 25% of files eligible; e.g. 1 = all files eligible. When unspecified, use

@siladu siladu enabled auto-merge (squash) May 13, 2025 20:13
@siladu siladu merged commit 07138e4 into hyperledger:main May 13, 2025
48 checks passed
@siladu siladu deleted the blockchain-gc branch May 13, 2025 20:32
@siladu siladu added the history reduce disk reqs thru history mgmt label May 20, 2025
@siladu siladu moved this to Done in History Expiry May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
history reduce disk reqs thru history mgmt
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants