Skip to content

Prevent database corruption on ungraceful shutdown #2267

@preston-evans98

Description

@preston-evans98

Background

Currently, ungraceful shutdowns during DB commit can leave the database in an inconsistent state.

Design

To prevent issues, we need to make minor changes to the rollup's startup sequence, the commit-flag enum, and the commit flow.

enum CommitFlag { 
   Successful { height: u64 },
   ComittingKernelNomt { height: u64 },
   ComittingUserNomt { height: u64 },
   ComittingRocksDB{ height: u64 },
}

On startup, check commit flag
1. If status is successful, start from the height +1
2. If ComittingKernelNomt, check whether the commit finished by comparing the NOMT root to the root stored in the live state table. If they match, no rollback is required. Otherwise, roll back kernel NOMT. Start from height
3. If ComittingUserNomt in progress, check whether the commit finished by comparing the NOMT root to the root stored in the live state table. If the roots matched, the commit didn't succeed and we're safe. Start from height
1. If the roots do not match, this means NOMT is out of sync with the flat state. To recover we can either
1. Enable rollback for user NOMT (not ideal)
2. Skip the consistency check (and update, which would be a no-op anyway) on the next commit
3. Note that witness generation will be impossible until the next commit. SO, when we enable proving, commit the witness before updating the NOMT instance.

On commit:

  1. Commit ledger and accessory state. These can't be used read block execution - so they should be safe even if they're not fully consistent; they'll get overwritten with the same data when we process the block again (TODO: Verify this is true of accessory state; might be used in sequencer for state root calcs etc.)
  2. Update flag -> ComittingKernelNomt { height: u64 },
  3. Commit kernel NOMT.
    1. NOMT claims to provide crash fault tolerance through a WAL, so this is atomic
  4. Update flag to ComittingUserNomt
  5. Commit user NOMT
  6. Update flag to CommittingRocksDB
  7. Commit all kernel/user DBs as a unit (remove "other" - it only contains a singleton which can be put in the main DBs;
    1. Commit archival first. If this fails, it'll get overwritten with identical values on the next run
    2. Commit all flat values atomically.
  8. Update flag to Successful

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions