-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Closed
Description
Is there an existing issue for this?
- I have searched the existing issues
What happened?
There are wrong calculations of the allowed pruning height when a node operator changes the snapshot-interval from one value (A) to a larger value (B). This issue can cause two problems:
- Prune the height while the snapshot at that height is processing.
- Code: https://github.com/cosmos/cosmos-sdk/blob/v0.50.11/store/pruning/manager.go#L130-L136
- The pruning logic assumes that it can safely delete all states up to
pruneSnapshotHeights[0] + snapshotInterval - 1. However, this assumption fails when the snapshot interval is changed, potentially deleting a snapshot that is still being processed. - Example Scenario:
- Block 10: The node operator sets
snapshot-interval= 10 andpruning-keep-recent= "5", so the node creates a snapshot at block 10. - Block 15: The operator changes
snapshot-intervalto 20. - Block 20: The node creates a new snapshot at block 20.
- Block 26: The pruning logic now prunes state up to block 29 (10 + 20 - 1 = 29). It will be limited to 20 because of the
pruning-keep-recent. The state at height 20 will be deleted and it causes issues on the snapshot If the snapshot at block 20 is not fully finished yet.
- Block 10: The node operator sets
- Pruning height stuck at the previous snapshot height.
- Code: https://github.com/cosmos/cosmos-sdk/blob/v0.50.11/store/pruning/manager.go#L83-L89
- The function only updates
pruneSnapshotHeightsif the next snapshot is atpreviousSnapshotHeight + snapshotInterval. - If the interval changes, this condition fails, meaning pruneSnapshotHeights does not shift forward.
- As a result, the first value in pruneSnapshotHeights gets stuck at an old height, and the node continues using it to determine which heights to prune up to. (same code section as Issue 1)
- Note: This also happens in case that snapshot at some height is failed or skipped.
- The function only updates
- Example Scenario:
- Block 0: The operator sets
snapshot-interval= 10. - Block 10: A snapshot is created.
pruneSnapshotHeights= [10]. - Block 15: The operator changes the
snapshot-intervalto 20. - Block 20: A new snapshot is created.
pruneSnapshotHeights= [10, 20].- because 20 (
pruneSnapshotHeights[1]) is not equal to 10 (pruneSnapshotHeights[0]) + 20 (snapshotInterval)
- because 20 (
- Block 40: Another snapshot is created.
pruneSnapshotHeights= [10, 20, 40]. - After that, pruning gets stuck:
pruneSnapshotHeightsremains [10, 20, 40, …], but pruning only happens up topruneSnapshotHeights[0] + snapshotInterval - 1= 29 (pruning stops at block 29) The node never prunes blocks beyond height 29, leading to unexpected storage growth.
- Block 0: The operator sets
Cosmos SDK Version
v0.50+ with store v1
How to reproduce?
- Install
simdfrom Cosmos SDK v0.50.11 - Configure the node (
~/.simapp/config/app.toml)- pruning = "custom"
- pruning-keep-recent = "5"
- pruning-interval = "10"
- snapshot-interval = 10
- Start the node and let it run. (The first snapshot will be created at block 10.)
- At block 15, stop the node and update
snapshot-intervalinapp.tomlto 20. - Starr the node again.
- At block 26, the node will attempt to prune the state at block 20 (since pruning-keep-recent = 5). If the snapshot at block 20 is still in progress, pruning deletes the state before snapshot completion. (Problem 1)
- After block 29, the node stops pruning as it is now limited by the first snapshot height (pruneSnapshotHeights[0]). (Problem 2)