Skip to content

Default RocksDB exit flush to WAL-only, add opt-in full flush#12047

Open
asdacap wants to merge 1 commit into
masterfrom
update-rocksdb-wal-flush
Open

Default RocksDB exit flush to WAL-only, add opt-in full flush#12047
asdacap wants to merge 1 commit into
masterfrom
update-rocksdb-wal-flush

Conversation

@asdacap

@asdacap asdacap commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Changes

On shutdown, DbOnTheRocks.Dispose() performed a full flush of each RocksDB instance — flushing the WAL and materializing the memtable into SST files. The memtable can be tens or hundreds of MB, so this flush can be slow enough that a container orchestrator's shutdown grace period SIGKILLs Nethermind mid-flush.

Flushing only the WAL is sufficient for durability: WAL-backed writes are fsynced and replayed on the next start, so the memtable does not need to be materialized into SST at exit time.

  • Default exit flush is now WAL-only (fast, durable shutdown).
  • Added a new FullFlushOnExit DB config option (default false) to opt back into the old full memtable flush on exit.
  • FlushOnExit (default true) remains the master switch; FullFlushOnExit only escalates to a full flush when FlushOnExit is also enabled.
FlushOnExit FullFlushOnExit exit flush
true (default) false (default) WAL only — new default
true true full flush (WAL + memtable)
false (any) no flush

Why it's safe (no data loss): WAL-backed writes are recovered via WAL replay. A few sync/pruning paths issue WAL-disabled writes (WriteFlags.DisableWAL — Era import, snap sync, full pruning); those remain protected by RocksDB's avoid_flush_during_shutdown option, which defaults to false and flushes memtables on close whenever unpersisted data exists. This PR intentionally does not set avoid_flush_during_shutdown.

No native plumbing was needed — InnerFlush(bool onlyWal) already supported a WAL-only mode; only the Dispose() path's flush mode changed. The public Flush(bool onlyWal = false) API and all explicit callers are unchanged.

Types of changes

What types of changes does your code introduce?

  • New feature (a non-breaking change that adds functionality)
  • Optimization

Testing

Requires testing

  • Yes

If yes, did you write tests?

  • Yes

Notes on testing

Added a parameterized regression test in PerTableDbConfigTests covering the new FullFlushOnExit option (default false and explicit overrides). The native flush behavior in Dispose() is not unit-testable without a real RocksDB instance and is left to manual/integration verification.

Documentation

Requires documentation update

  • Yes
  • No

Requires explanation in Release Notes

  • Yes

The default behavior on shutdown changed from a full flush to a WAL-only flush. Operators who relied on the old full-flush-on-exit behavior can restore it by setting FullFlushOnExit = true.

🤖 Generated with Claude Code

On shutdown DbOnTheRocks.Dispose() did a full flush (WAL + memtable). The
memtable can be hundreds of MB, so the flush can be slow enough that a
container orchestrator SIGKILLs Nethermind mid-flush. Flushing only the WAL
is sufficient for durability since WAL-backed writes are replayed on restart.

Make the default exit flush WAL-only and add a new FullFlushOnExit option
(default false) to opt back into the full memtable flush. WAL-disabled sync
writes remain protected by RocksDB's avoid_flush_during_shutdown default, so
no data is lost.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@claude

claude Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Claude finished @asdacap's task in 6m 35s —— View job


Review

  • Gather context (PR description, changed files)
  • Read changed source files
  • Analyze correctness and safety of WAL-only flush change
  • Check config wiring across all config layers
  • Review tests
  • Post findings

The approach is correct and the implementation is clean. The core insight — that rocksdb_flush_wal is sufficient for WAL-backed writes while DB::Close() (with avoid_flush_during_shutdown=false) handles any residual DisableWAL memtable data — is sound. Config wiring is consistent across all four layers (IDbConfig, DbConfig, IRocksDbConfig, PerTableDbConfig, AdjustedRocksdbConfig). NSubstitute mocks in existing tests will auto-implement FullFlushOnExit with default false, so no test breakage.

Findings

Medium

  • Missing code comment for DisableWAL safety invariant (DbOnTheRocks.cs:1503) — The WAL-only flush path relies on RocksDB's avoid_flush_during_shutdown=false default to flush DisableWAL memtable data during DB::Close(). This is documented in the PR description but not at the call site. If someone later adds avoid_flush_during_shutdown=true to any DB's RocksDbOptions, DisableWAL writes (snap sync, Era import, pruning) would be silently lost. (Acknowledged in PR description with correct rationale — inline suggestion posted.)

Low

  • FullFlushOnExit silently no-ops when FlushOnExit=false (IDbConfig.cs:37) — No doc comment or [ConfigItem] description explains this dependency. An operator setting FullFlushOnExit=true alongside FlushOnExit=false gets no diagnostic. (Inline suggestion posted.)

  • No test for FlushOnExit=false, FullFlushOnExit=true — The config-reading test is good, but the actual dispose interaction (that FullFlushOnExit is ignored when FlushOnExit=false) isn't covered. Low priority since the dispose path requires a real RocksDB instance and the PR description notes this.

Verdict

The change is safe and the logic is correct. The medium finding has been explicitly acknowledged in the PR description with correct rationale (avoid_flush_during_shutdown=false is the default and this PR intentionally does not change it). Mergeable after addressing the inline suggestions, or at minimum adding the code comment at the Dispose call site.

_reader.Dispose();

if (_perTableDbConfig.FlushOnExit) InnerFlush(false);
if (_perTableDbConfig.FlushOnExit) InnerFlush(onlyWal: !_perTableDbConfig.FullFlushOnExit);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium – missing safety invariant comment

The WAL-only flush path (FullFlushOnExit=false) is safe for normal WAL-backed writes because rocksdb_flush_wal syncs the WAL and RocksDB replays it on next start. For DisableWAL writes (Era import, snap sync, full pruning) the safety guarantee comes from RocksDB's default avoid_flush_during_shutdown=false, which causes DB::Close() (invoked inside ReleaseUnmanagedResources) to flush any remaining memtable data that is not covered by the WAL.

This is a subtle invariant: if avoid_flush_during_shutdown=true is ever set in RocksDbOptions on any DB, DisableWAL writes could be silently lost. A short comment here would prevent a future regression:

Suggested change
if (_perTableDbConfig.FlushOnExit) InnerFlush(onlyWal: !_perTableDbConfig.FullFlushOnExit);
// WAL-only flush (default): rocksdb_flush_wal syncs WAL to disk; WAL-backed writes recover via replay on
// next start. DisableWAL writes (snap sync, Era import, pruning) are protected by RocksDB's default
// avoid_flush_during_shutdown=false, which flushes all remaining memtable data during DB::Close() below.
// Do NOT set avoid_flush_during_shutdown=true in RocksDbOptions without also enabling FullFlushOnExit.
if (_perTableDbConfig.FlushOnExit) InnerFlush(onlyWal: !_perTableDbConfig.FullFlushOnExit);

bool EnableFileWarmer { get; set; }
double CompressibilityHint { get; set; }
bool FlushOnExit { get; set; }
bool FullFlushOnExit { get; set; }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low – undocumented dependency on FlushOnExit

FullFlushOnExit=true is silently a no-op when FlushOnExit=false (the outer guard in Dispose). Given that both flags can be set independently in a JSON config file, an operator who sets FullFlushOnExit=true while FlushOnExit=false will get unexpected behavior with no diagnostic. A [ConfigItem] or XML doc noting the dependency would help:

Suggested change
bool FullFlushOnExit { get; set; }
/// <summary>When true, the full memtable flush (WAL + SST materialization) is performed on exit instead of WAL-only flush. Only has effect when <see cref="FlushOnExit"/> is also true.</summary>
bool FullFlushOnExit { get; set; }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant