Skip to content

[Bug]: Wrong calculation of allowed pruning height when changing snapshot-interval #23638

@RogerKSI

Description

@RogerKSI

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

There are wrong calculations of the allowed pruning height when a node operator changes the snapshot-interval from one value (A) to a larger value (B). This issue can cause two problems:

  1. Prune the height while the snapshot at that height is processing.
  • Code: https://github.com/cosmos/cosmos-sdk/blob/v0.50.11/store/pruning/manager.go#L130-L136
  • The pruning logic assumes that it can safely delete all states up to pruneSnapshotHeights[0] + snapshotInterval - 1. However, this assumption fails when the snapshot interval is changed, potentially deleting a snapshot that is still being processed.
  • Example Scenario:
    • Block 10: The node operator sets snapshot-interval = 10 and pruning-keep-recent = "5", so the node creates a snapshot at block 10.
    • Block 15: The operator changes snapshot-interval to 20.
    • Block 20: The node creates a new snapshot at block 20.
    • Block 26: The pruning logic now prunes state up to block 29 (10 + 20 - 1 = 29). It will be limited to 20 because of the pruning-keep-recent. The state at height 20 will be deleted and it causes issues on the snapshot If the snapshot at block 20 is not fully finished yet.
  1. Pruning height stuck at the previous snapshot height.
  • Code: https://github.com/cosmos/cosmos-sdk/blob/v0.50.11/store/pruning/manager.go#L83-L89
    • The function only updates pruneSnapshotHeights if the next snapshot is at previousSnapshotHeight + snapshotInterval.
    • If the interval changes, this condition fails, meaning pruneSnapshotHeights does not shift forward.
    • As a result, the first value in pruneSnapshotHeights gets stuck at an old height, and the node continues using it to determine which heights to prune up to. (same code section as Issue 1)
    • Note: This also happens in case that snapshot at some height is failed or skipped.
  • Example Scenario:
    • Block 0: The operator sets snapshot-interval = 10.
    • Block 10: A snapshot is created. pruneSnapshotHeights = [10].
    • Block 15: The operator changes the snapshot-interval to 20.
    • Block 20: A new snapshot is created. pruneSnapshotHeights = [10, 20].
      • because 20 (pruneSnapshotHeights[1]) is not equal to 10 (pruneSnapshotHeights[0]) + 20 (snapshotInterval)
    • Block 40: Another snapshot is created. pruneSnapshotHeights = [10, 20, 40].
    • After that, pruning gets stuck:
      • pruneSnapshotHeights remains [10, 20, 40, …], but pruning only happens up to pruneSnapshotHeights[0] + snapshotInterval - 1 = 29 (pruning stops at block 29) The node never prunes blocks beyond height 29, leading to unexpected storage growth.

Cosmos SDK Version

v0.50+ with store v1

How to reproduce?

  1. Install simd from Cosmos SDK v0.50.11
  2. Configure the node (~/.simapp/config/app.toml)
    • pruning = "custom"
    • pruning-keep-recent = "5"
    • pruning-interval = "10"
    • snapshot-interval = 10
  3. Start the node and let it run. (The first snapshot will be created at block 10.)
  4. At block 15, stop the node and update snapshot-interval in app.toml to 20.
  5. Starr the node again.
  6. At block 26, the node will attempt to prune the state at block 20 (since pruning-keep-recent = 5). If the snapshot at block 20 is still in progress, pruning deletes the state before snapshot completion. (Problem 1)
  7. After block 29, the node stops pruning as it is now limited by the first snapshot height (pruneSnapshotHeights[0]). (Problem 2)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Relationships

None yet

Development

No branches or pull requests

Issue actions