NoKV's flush path converts immutable memtables into L0 SST files, then advances the manifest WAL checkpoint and reclaims obsolete WAL segments. The task scheduler is in lsm/flush; SST persistence and manifest install are in lsm/builder.go and lsm/levels.go.
- Persistence: materialize immutable memtables into SST files.
- Ordering: publish SST metadata to manifest only after the SST is durably installed (strict mode).
- Cleanup: remove WAL segments once checkpoint and raft constraints allow removal.
- Observability: export queue/build/release timing through flush metrics.
flowchart LR
Active[Active MemTable]
Immutable[Immutable MemTable]
FlushQ[flush.Manager queue]
Build[StageBuild]
Install[StageInstall]
Release[StageRelease]
Active -->|threshold reached| Immutable --> FlushQ
FlushQ --> Build --> Install --> Release --> Active
- StagePrepare:
flush.Manager.Submitenqueues task and records wait-start time. - StageBuild: worker pulls task via
flush.Manager.Next, builds SST (levelManager.flush->openTable->tableBuilder.flush). - StageInstall: after SST + manifest edits succeed, worker marks install complete (
flush.Manager.Update(..., StageInstall, ...)). - StageRelease: worker removes immutable from in-memory list, closes memtable, records release metrics, and completes task.
Flush uses two write modes controlled by Options.ManifestSync:
-
Fast path (
ManifestSync=false)- Writes SST directly to final filename with
O_CREATE|O_EXCL. - No temp file/rename step.
- Highest throughput, weaker crash-consistency guarantees.
- Writes SST directly to final filename with
-
Strict path (
ManifestSync=true)- Writes to
"<table>.tmp.<pid>.<ns>". tmp.Sync()to persist SST bytes.RenameNoReplace(tmp, final)installs file atomically. If unsupported by platform/filesystem, returnsvfs.ErrRenameNoReplaceUnsupported.SyncDir(workdir)is called before manifest edit so directory entry is durable.
- Writes to
This is the durability ordering used by current code.
lsm.Set/lsm.SetBatchdetectswalSize + estimate > MemTableSizeand rotates memtable.- Rotated memtable is submitted to
flush.Manager(lsm.submitFlush). - Worker executes
levelManager.flush(mt):- iterates memtable entries,
- builds SST via
tableBuilder, - prepares manifest edits:
EditAddFile+EditLogPointer.
- In strict mode,
SyncDirruns beforemanifest.LogEdits(...). - On successful manifest commit, table is added to L0 and
wal.RemoveSegmentruns when allowed.
- Startup rebuild (
levelManager.build) validates manifest SST entries against disk. - Missing or unreadable SSTs are treated as stale and removed from manifest via
EditDeleteFile, allowing startup to continue. - Temp SST names are only used in strict mode and are created in
WorkDirwith suffix.tmp.<pid>.<ns>(not a dedicatedtmp/directory).
flush.Manager.Stats() feeds StatsSnapshot.Flush:
pending,queue,active- wait/build/release totals, counts, last, max
completed
Use:
nokv stats --workdir <dir>to inspect flush backlog and latency.
lsm/flush/manager_test.go: queue/stage transitions and timing counters.db_test.go::TestRecoveryWALReplayRestoresData: replay still restores data after crash before flush completion.db_test.go::TestRecoveryCleansMissingSSTFromManifestanddb_test.go::TestRecoveryCleansCorruptSSTFromManifest: stale manifest SST cleanup on startup.
See also recovery.md, memtable.md, and wal.md.