Skip to content

8337 modify chain data pruner #8506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Matilda-Clerke
Copy link
Contributor

PR description

Add an online option to prune pre-merge blocks from the database

Fixed Issue(s)

#8337

names = {PRE_MERGE_PRUNING_ENABLED_FLAG},
description =
"Enable the chain pruner to actively prune pre-merge blocks, but not headers (default: ${DEFAULT-VALUE})")
private final Boolean preMergePruningEnabled = Boolean.FALSE;
Copy link
Contributor

@garyschulte garyschulte Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to see chain pruning be contingent on the sync mode in the same way trielog pruning is. Like a typical node would want to prune pre-merge history, but a full-sync node would not.

The network pre-merge cut-off could be an empty Optional for all networks except those that have at release time entered the proto-4444 window. This way chain pruning could always be enabled, but the presence or absence of a pruning cutoff is what causes the behavior change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it would be nice to enable pruning by default, I think users might be upset by their stored history being deleted unexpectedly.

Copy link
Contributor

@garyschulte garyschulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally thinking of a one-off purge of pre-merge blocks and receipts but this approach makes sense in light of the future eip-4444 moving window

Copy link
Contributor

@pinges pinges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments

new ChainDataPrunerStorage(new InMemoryKeyValueStorage()),
0,
ChainDataPruner.Mode.CHAIN_PRUNING,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests added for pre merge pruning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, but I can add some once we're happy with the code

Matilda-Clerke and others added 3 commits April 28, 2025 11:15
blockchainStorage, () -> blockchain.removeObserver(chainDataPrunerObserverId.get()));
chainDataPrunerObserverId.set(blockchain.observeBlockAdded(chainDataPruner));
if (chainPrunerConfiguration.chainPruningEnabled()) {
LOG.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good user feedback.

switch (network) {
case MAINNET -> MAINNET_MERGE_BLOCK_NUMBER;
case SEPOLIA -> SEPOLIA_MERGE_BLOCK_NUMBER;
default -> 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a network is cancun at genesis, like hoodi is, then we don't do any PRE_MERGE_PRUNING, which makes sense. We would start CHAIN_PRUNING once we got to block 256. However, 6110 doesn't activate till pectra does. Might we have an inconsistency wrt deposits in the interim between them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer using the checkpoint at Genesis for pruning configuration, as it aligns with the approach taken in #8582

Copy link
Contributor

@Gabriel-Trintinalia Gabriel-Trintinalia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this PR shares some similarities with #8582.

The 8582 PR includes a flag for saving only headers to the database, which we could consolidate with the flags in this PR.

Additionally, 8582 utilises the checkpoint at genesis to establish the header cutoff. I suggest we adopt the same approach here, rather than introducing new constants to NetworkName or other locations.

Signed-off-by: Matilda Clerke <[email protected]>
Copy link
Contributor

@ahamlat ahamlat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not part of the PR, but more an improvement, make the chain pruner threadpool expose the same metrics names as other thread pools.
To include the metrics in Besu Full dashboard, we need to change

MonitoredExecutors.newBoundedThreadPool(
    ChainDataPruner.class.getSimpleName(),
    1,
    1,
    ChainDataPruner.MAX_PRUNING_THREAD_QUEUE_SIZE,
    metricsSystem));

to

MonitoredExecutors.newBoundedThreadPool(
            EthScheduler.class.getSimpleName() + "-ChainDataPruner",
            1,
            1,
            ChainDataPruner.MAX_PRUNING_THREAD_QUEUE_SIZE,
            metricsSystem));

One other solution is to integrate the thread pool to EthScheduler.
This is related to the formula we're using to display the metrics of monitred thread pools

as the formula currently is something like this
/besu_executors_ethscheduler_([^_]*)_.*/

@macfarla macfarla added the history reduce disk reqs thru history mgmt label May 23, 2025
@macfarla macfarla dismissed ahamlat’s stale review May 23, 2025 02:48
        EthScheduler.class.getSimpleName() + "-ChainDataPruner",

this change has been made

Copy link
Contributor

@macfarla macfarla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a changelog

@Matilda-Clerke
Copy link
Contributor Author

Latest testing, including enabling database garbage collection, shows fairly minimal impact on newPayload and FCU
image
image

@siladu siladu moved this to In review in History Expiry May 24, 2025
@Matilda-Clerke Matilda-Clerke force-pushed the 8337-modify-chain-data-pruner branch from 4edd2f1 to 5769dea Compare May 26, 2025 23:49
@Matilda-Clerke Matilda-Clerke requested a review from siladu May 26, 2025 23:50
@Matilda-Clerke
Copy link
Contributor Author

Some interesting findings from the latest test run (with double execution issue fixed)
image
All batch sizes eventually cause some increase in newPayload execution time above baseline, but lower batch size causes it to be less impactful over a longer period of time.

Copy link
Contributor

@siladu siladu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚢

@Matilda-Clerke Matilda-Clerke merged commit 1e084c7 into hyperledger:main May 29, 2025
48 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in History Expiry May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
history reduce disk reqs thru history mgmt
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants