Skip to content

Conversation

@bkolad
Copy link
Member

@bkolad bkolad commented Dec 29, 2025

Description

This PR adds tracking for DB commit status.

Previously, the CommitFlag mechanism saved the CommitStatus to a file, but it had two major limitations:

  1. It did not cover all possible database shutdown scenarios.
  2. The flag was written after committing a specific DB. It is now written before the commit, which simplifies error handling.

The CommitStatus is now written to a file in the DbGroup::commit method, and recovery is performed in NomtStateDb::validate_commit_flag_and_rollback_if_necssesary.

Unit tests will be included in a follow-up PR.

  • I have updated CHANGELOG.md with a new entry if my PR makes any breaking changes or fixes a bug. If my PR removes a feature or changes its behavior, I provide help for users on how to migrate to the new behavior.
  • I have carefully reviewed all my Cargo.toml changes before opening the PRs. (Are all new dependencies necessary? Is any module dependency leaked into the full-node (hint: it shouldn't)?)

@bkolad bkolad force-pushed the blaze/fix_db_commit branch from 1662beb to d804845 Compare December 29, 2025 10:57
@bkolad bkolad changed the title Blaze/fix db commit sov-db: Track DB commit status Dec 29, 2025
@bkolad bkolad force-pushed the blaze/fix_db_commit branch 3 times, most recently from 0ea1d6e to 811a9ba Compare December 29, 2025 19:04
Base automatically changed from blaze/atomic_db_1 to dev December 29, 2025 19:06
@bkolad bkolad force-pushed the blaze/fix_db_commit branch from 811a9ba to 600ba4a Compare December 29, 2025 19:10
@bkolad bkolad marked this pull request as ready for review December 30, 2025 13:38

pub fn log_reset_instruction(&self) {
tracing::error!(
"To reset commit flag, please remove commit flag file: `rm {}`",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was useful if user knows that it just wants to remove flag, they can just copy command and execute it. Why we removed it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we ever want to reset the flag manually? The flow has changed and now we save the file before committing the nomt.

debug_assert_eq!(self.commit_flag.read_status()?, in_progress_commit_status);
commit_flag.save_commit_status(&CommitStatus::CommittingKernelNomt(
self.kernel.root().into_inner(),
))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if NOMT has commited successfully, but writing flag has failed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set the flag before committing to Nomt, so if saving the flag fails, we won’t attempt the Nomt commit

tracing::warn!(
"Detected in-progress commit {commit_status:?}. Rolling back kernel & user DBs."
);
// User & Kernel commit was sucefull but we don't see `Success`. We rollback both User & Kernek.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// User & Kernel commit was sucefull but we don't see `Success`. We rollback both User & Kernek.
// User & Kernel commit was successful but we don't see `Success`. We rollback both User & Kernel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

@citizen-stig citizen-stig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall is good, left couple of comments

let flat_state = FlatStateDb::new(path, state_cache_size, separate_archival_state)?;

// After validation override the commit flag to Success.
commit_flag.write_status(&CommitStatus::Success)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we commited live-db successfully, but failed to write commit status Success ?

Copy link
Member Author

@bkolad bkolad Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this with Preston when you ware on holiday:

  1. We will rollback both the user and kernel so they remain in a consistent state.
  2. The flat-db and ledger will temporarily be ahead of Nomt,
  3. On startup, the rollup will start from the latest height committed to nomt. After processing the first block, we will override the flat-db/ledger

@bkolad bkolad force-pushed the blaze/fix_db_commit branch 2 times, most recently from 1c158f2 to af72874 Compare December 30, 2025 16:57
?commit_status,
"Detected in-progress commit. Rolling back kernel & user DBs."
);
// User & Kernel commit was sucefull but we don't see `Success`. We rollback both User & Kernel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// User & Kernel commit was sucefull but we don't see `Success`. We rollback both User & Kernel.
// User & Kernel commit was successful but we don't see `Success`. We rollback both User & Kernel.

self.user.rollback(1)?;
}
CommitStatus::Success => {}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update flag to success at the end, so it is consistent on the next restart?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning if it accidentally shut down immediately before starting new block

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the save in DbGroup::new

Ok(Self { user, kernel })
}

pub(crate) fn validate_commit_flag_and_rollback_if_necssesary(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub(crate) fn validate_commit_flag_and_rollback_if_necssesary(
pub(crate) fn validate_commit_flag_and_rollback_if_necessesary(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants