Skip to content

Commit d6eae40

Browse files
committed
fix: resolve full-codebase audit findings (L0 range-del get, trivial move)
Full /x-overhaul audit across all src/ subsystems (10 parallel reviews). Nine subsystems were clean; three findings fixed: - HIGH db: get() could resurrect a range-deleted key. A single memtable flush split into multiple L0 SSTs numbers files by user key, not sequence, and a range tombstone is written only to the file holding its start key. The L0 get() loop scanned newest-first and early- returned on the first matching point entry, before visiting the lower-numbered file carrying the covering tombstone -- so get() returned a stale value while iter() correctly returned nothing. The L0 branch now pre-scans all tombstone-bearing L0 files before any point lookup, matching the L1+ and iterator paths. - LOW compaction: trivial-move was permanently unreachable because the always-on lazy-delete wrapper makes options.compaction_filter always Some. Added a defaulted CompactionFilter::is_noop(); LazyDeleteFilter reports no-op when no user filter is set and no dead keys are registered, so single-file Ln->Ln+1 moves skip the rewrite again. - LOW types: module-header doc now documents the trailer bit-inversion, matching the struct doc and the actual encoding. make fmt + clippy -D warnings clean; full debug + release suites pass.
1 parent d837fee commit d6eae40

6 files changed

Lines changed: 50 additions & 10 deletions

File tree

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "mmdb"
3-
version = "3.3.0"
3+
version = "3.3.1"
44
edition = "2024"
55
description = "The storage engine behind vsdb — a pure-Rust LSM-Tree key-value store"
66
license = "MIT"

docs/audit.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,9 +55,9 @@
5555
- **What**: Comment uses "Safety:" but no unsafe block is involved.
5656
- **Reason**: Comment is a design note, not a safety justification. Cosmetic only.
5757

58-
### [LOW] db: point-lookup `get` path scans all L1+ files with range deletions
58+
### [LOW] db: point-lookup `get` path scans all L0 + L1+ files with range deletions
5959
- **Where**: src/db.rs (get path), src/iterator/
60-
- **What**: The point-lookup `get` path iterates ALL L1+ files that have range deletions to check for covering tombstones — O(files_with_range_dels). The iterator path bounds this by checking `smallest_key > upper_bound` to skip files entirely past the range, but the point-lookup path cannot apply the same optimization since range tombstones can extend past a file's largest key.
60+
- **What**: The point-lookup `get` path iterates ALL L0 and L1+ files that have range deletions to check for covering tombstones — O(files_with_range_dels). The iterator path bounds the L1+ portion by checking `smallest_key > upper_bound` to skip files entirely past the range, but the point-lookup path cannot apply the same optimization since range tombstones can extend past a file's largest key. The L0 branch must pre-scan every tombstone-bearing L0 file before any point lookup because a single split flush numbers its files by key, not sequence, so a tombstone can live in a different L0 file than the keys it covers.
6161
- **Reason**: This is an inherent limitation of range tombstones combined with point lookups. The cost is bounded by the number of tombstone-bearing files (typically small in practice), and the alternative (maintaining a separate tombstone index) would add write-path complexity disproportionate to the benefit.
6262

6363
### [LOW] types: `InternalKey` panics on malformed encoded data

src/compaction/leveled.rs

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1027,11 +1027,17 @@ impl LeveledCompaction {
10271027

10281028
// Trivial move optimization: if there's exactly one input file and no
10291029
// overlap with the next level, just move the metadata without rewriting.
1030-
// Skip the optimisation when a compaction filter is installed, because
1031-
// the filter needs to see every key-value pair to decide on removals.
1030+
// Skip the optimisation when the compaction filter could remove or
1031+
// change entries — it needs to see every key-value pair. A filter that
1032+
// reports itself a no-op (e.g. lazy-delete with no user filter and no
1033+
// registered dead keys) still permits the move.
10321034
if task.input_files_level.len() == 1
10331035
&& task.input_files_next.is_empty()
1034-
&& ctx.options.compaction_filter.is_none()
1036+
&& ctx
1037+
.options
1038+
.compaction_filter
1039+
.as_ref()
1040+
.is_none_or(|f| f.is_noop())
10351041
{
10361042
let tf = &task.input_files_level[0];
10371043
let mut edit = VersionEdit::new();

src/db.rs

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,14 @@ impl CompactionFilter for LazyDeleteFilter {
245245
}
246246
CompactionFilterDecision::Keep
247247
}
248+
249+
fn is_noop(&self) -> bool {
250+
// No-op only when there is no user filter to consult AND no key is
251+
// currently registered for lazy deletion. A dead key registered after a
252+
// trivial move is still removed by a later compaction that re-examines
253+
// the relocated file, so allowing the move here is safe.
254+
self.user_filter.as_ref().is_none_or(|f| f.is_noop()) && self.dead_keys.read().is_empty()
255+
}
248256
}
249257

250258
impl DB {
@@ -959,13 +967,27 @@ impl DB {
959967
return Ok(None);
960968
}
961969

962-
// 4. L0 SST files (newest first, may overlap)
963-
// Check range tombstones in each file and accumulate max_tomb_seq.
964-
for tf in version.level_files(0) {
970+
// 4. L0 SST files (newest first, may overlap).
971+
// A single memtable flush can be split into several L0 files cut at
972+
// user-key boundaries; those files are numbered in ascending key order,
973+
// and a range tombstone is written only to the file holding its start
974+
// key (it is not fragmented into later split files). A tombstone in a
975+
// lower-numbered file can therefore cover point keys living in a
976+
// higher-numbered file from the same flush. Because L0 is scanned
977+
// newest-first (descending file number), that point key would be found
978+
// and returned before the tombstone-bearing file is ever visited. So
979+
// accumulate covering tombstone sequences across ALL L0 files before
980+
// doing any point lookup, mirroring the L1+ branch and the iterator
981+
// path. (File number does not track sequence order within one split
982+
// flush, so the per-file early-out is unsound here.)
983+
let l0_files = version.level_files(0);
984+
for tf in l0_files {
965985
if tf.meta.has_range_deletions {
966986
let file_tomb_seq = tf.reader.max_covering_tombstone_seq(key, seq).c(d!())?;
967987
max_tomb_seq = max_tomb_seq.max(file_tomb_seq);
968988
}
989+
}
990+
for tf in l0_files {
969991
if let Some((result, entry_seq)) = tf.reader.get_internal_with_seq(key, seq).c(d!())? {
970992
if max_tomb_seq > entry_seq {
971993
return Ok(None);

src/options.rs

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,17 @@ pub enum CompactionFilterDecision {
359359
/// During compaction, each key-value pair is passed to the filter for a decision.
360360
pub trait CompactionFilter: Send + Sync {
361361
fn filter(&self, level: usize, key: &[u8], value: &[u8]) -> CompactionFilterDecision;
362+
363+
/// Returns `true` if this filter never removes or changes any entry, i.e.
364+
/// [`filter`](Self::filter) always returns [`CompactionFilterDecision::Keep`].
365+
///
366+
/// When `true`, the engine may apply the trivial-move optimization
367+
/// (relocating a single SST to the next level without rewriting it) instead
368+
/// of streaming every entry through the filter. The default is `false`
369+
/// (conservative — the filter is assumed to be active).
370+
fn is_noop(&self) -> bool {
371+
false
372+
}
362373
}
363374

364375
impl fmt::Debug for dyn CompactionFilter {

src/types.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
//! Core data types: InternalKey, ValueType, SequenceNumber.
22
//!
33
//! Encoding follows RocksDB convention:
4-
//! InternalKey = \[user_key bytes\]\[packed: seq << 8 | type\] (last 8 bytes)
4+
//! InternalKey = \[user_key bytes\]\[trailer: !(seq << 8 | type) as big-endian\] (last 8 bytes)
5+
//! The trailer is bit-inverted so a higher seq yields smaller bytes.
56
//! Sort order: user_key ASC, sequence DESC, value_type DESC
67
78
use std::cmp::Ordering;

0 commit comments

Comments
 (0)