You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Investigation of a production SST corruption revealed that a single hardware
bit flip (bit 32) in a value Slice's size_ field during compaction caused a
5.6MB value to be treated as 4.3GB. The corruption was silent because:
- BlockBuilder's varint encoding uses static_cast<uint32_t>(value.size()),
which truncated the corrupted size back to the correct value
- buffer_.append(value.data(), value.size()) used the full 64-bit corrupted
size, appending 4GB of adjacent heap memory into the block
- The block checksum was computed over the corrupted data, so it matched
- paranoid_file_checks was disabled
This change adds 2 layers of defense:
1. **Value size uint32 truncation guard**: In BlockBasedTableBuilder::Add(),
detect when value.size() exceeds uint32_t range before passing it to
BlockBuilder where the varint encoding would silently truncate it.
Returns Status::Corruption so the flush/compaction can abort gracefully
without crashing the process. This applies to both flush and compaction
paths since all key-value pairs flow through BlockBasedTableBuilder::Add().
2. **Compaction output/input size ratio check**: New option
max_compaction_output_to_input_ratio (default: 10, 0 to disable) in
MutableCFOptions. After compaction, if total output size exceeds this ratio
times total input size, return Status::Corruption. This catches cases where
corrupted values inflate the output file far beyond what input data justifies.
This check applies to compaction only (not flush), since flush writes from a
memtable and does not have input files for comparison. However, flush output
is still protected by guards #1 and #2.
Additionally, both the flush loop (db/builder.cc) and the compaction output
loop (db/compaction/compaction_outputs.cc) now check builder->status() after
each Add() call. Previously, a corruption error set inside the builder was
only surfaced when Finish() was called, causing the loop to continue
iterating through potentially millions of keys doing wasted work.
Test Plan:
Unit test
0 commit comments