Default RocksDB exit flush to WAL-only, add opt-in full flush#12047
Default RocksDB exit flush to WAL-only, add opt-in full flush#12047asdacap wants to merge 1 commit into
Conversation
On shutdown DbOnTheRocks.Dispose() did a full flush (WAL + memtable). The memtable can be hundreds of MB, so the flush can be slow enough that a container orchestrator SIGKILLs Nethermind mid-flush. Flushing only the WAL is sufficient for durability since WAL-backed writes are replayed on restart. Make the default exit flush WAL-only and add a new FullFlushOnExit option (default false) to opt back into the full memtable flush. WAL-disabled sync writes remain protected by RocksDB's avoid_flush_during_shutdown default, so no data is lost. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Claude finished @asdacap's task in 6m 35s —— View job Review
The approach is correct and the implementation is clean. The core insight — that FindingsMedium
Low
VerdictThe change is safe and the logic is correct. The medium finding has been explicitly acknowledged in the PR description with correct rationale ( |
| _reader.Dispose(); | ||
|
|
||
| if (_perTableDbConfig.FlushOnExit) InnerFlush(false); | ||
| if (_perTableDbConfig.FlushOnExit) InnerFlush(onlyWal: !_perTableDbConfig.FullFlushOnExit); |
There was a problem hiding this comment.
Medium – missing safety invariant comment
The WAL-only flush path (FullFlushOnExit=false) is safe for normal WAL-backed writes because rocksdb_flush_wal syncs the WAL and RocksDB replays it on next start. For DisableWAL writes (Era import, snap sync, full pruning) the safety guarantee comes from RocksDB's default avoid_flush_during_shutdown=false, which causes DB::Close() (invoked inside ReleaseUnmanagedResources) to flush any remaining memtable data that is not covered by the WAL.
This is a subtle invariant: if avoid_flush_during_shutdown=true is ever set in RocksDbOptions on any DB, DisableWAL writes could be silently lost. A short comment here would prevent a future regression:
| if (_perTableDbConfig.FlushOnExit) InnerFlush(onlyWal: !_perTableDbConfig.FullFlushOnExit); | |
| // WAL-only flush (default): rocksdb_flush_wal syncs WAL to disk; WAL-backed writes recover via replay on | |
| // next start. DisableWAL writes (snap sync, Era import, pruning) are protected by RocksDB's default | |
| // avoid_flush_during_shutdown=false, which flushes all remaining memtable data during DB::Close() below. | |
| // Do NOT set avoid_flush_during_shutdown=true in RocksDbOptions without also enabling FullFlushOnExit. | |
| if (_perTableDbConfig.FlushOnExit) InnerFlush(onlyWal: !_perTableDbConfig.FullFlushOnExit); |
| bool EnableFileWarmer { get; set; } | ||
| double CompressibilityHint { get; set; } | ||
| bool FlushOnExit { get; set; } | ||
| bool FullFlushOnExit { get; set; } |
There was a problem hiding this comment.
Low – undocumented dependency on FlushOnExit
FullFlushOnExit=true is silently a no-op when FlushOnExit=false (the outer guard in Dispose). Given that both flags can be set independently in a JSON config file, an operator who sets FullFlushOnExit=true while FlushOnExit=false will get unexpected behavior with no diagnostic. A [ConfigItem] or XML doc noting the dependency would help:
| bool FullFlushOnExit { get; set; } | |
| /// <summary>When true, the full memtable flush (WAL + SST materialization) is performed on exit instead of WAL-only flush. Only has effect when <see cref="FlushOnExit"/> is also true.</summary> | |
| bool FullFlushOnExit { get; set; } |
Changes
On shutdown,
DbOnTheRocks.Dispose()performed a full flush of each RocksDB instance — flushing the WAL and materializing the memtable into SST files. The memtable can be tens or hundreds of MB, so this flush can be slow enough that a container orchestrator's shutdown grace period SIGKILLs Nethermind mid-flush.Flushing only the WAL is sufficient for durability: WAL-backed writes are fsynced and replayed on the next start, so the memtable does not need to be materialized into SST at exit time.
FullFlushOnExitDB config option (defaultfalse) to opt back into the old full memtable flush on exit.FlushOnExit(defaulttrue) remains the master switch;FullFlushOnExitonly escalates to a full flush whenFlushOnExitis also enabled.FlushOnExitFullFlushOnExittrue(default)false(default)truetruefalseWhy it's safe (no data loss): WAL-backed writes are recovered via WAL replay. A few sync/pruning paths issue WAL-disabled writes (
WriteFlags.DisableWAL— Era import, snap sync, full pruning); those remain protected by RocksDB'savoid_flush_during_shutdownoption, which defaults tofalseand flushes memtables on close whenever unpersisted data exists. This PR intentionally does not setavoid_flush_during_shutdown.No native plumbing was needed —
InnerFlush(bool onlyWal)already supported a WAL-only mode; only theDispose()path's flush mode changed. The publicFlush(bool onlyWal = false)API and all explicit callers are unchanged.Types of changes
What types of changes does your code introduce?
Testing
Requires testing
If yes, did you write tests?
Notes on testing
Added a parameterized regression test in
PerTableDbConfigTestscovering the newFullFlushOnExitoption (defaultfalseand explicit overrides). The native flush behavior inDispose()is not unit-testable without a real RocksDB instance and is left to manual/integration verification.Documentation
Requires documentation update
Requires explanation in Release Notes
The default behavior on shutdown changed from a full flush to a WAL-only flush. Operators who relied on the old full-flush-on-exit behavior can restore it by setting
FullFlushOnExit = true.🤖 Generated with Claude Code