Skip to content

Commit 3d604a5

Browse files
committed
Fix data race in MemTable::GetBloomFilter() on ARM weak memory model
Use release-acquire memory ordering instead of relaxed to ensure DynamicBloom object is fully initialized before pointer publication. Problem: - On ARM (weak memory model), using relaxed ordering allows the pointer to be published before DynamicBloom::data_ is initialized - Readers may see non-null pointer but access uninitialized data_, causing crashes Solution: - Use memory_order_acquire for load and memory_order_release for store - This establishes synchronizes-with relationship, guaranteeing readers only see fully initialized objects Performance impact: - The acquire load executes on every call (hot path in Get/Add operations) - Overhead is minimal: typically 0-2 cycles on x86 (same as relaxed), ~1-3 cycles on ARM (may add memory barrier) - This overhead is <1% of total latency compared to subsequent bloom filter operations (hash computation + memory access) Signed-off-by: hhwyt <hhwyt1@gmail.com>
1 parent 2b04d05 commit 3d604a5

File tree

1 file changed

+15
-2
lines changed

1 file changed

+15
-2
lines changed

db/memtable.h

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -719,7 +719,20 @@ class MemTable {
719719

720720
inline DynamicBloom* GetBloomFilter() {
721721
if (needs_bloom_filter_) {
722-
auto ptr = bloom_filter_ptr_.load(std::memory_order_relaxed);
722+
// Uses release-acquire to prevent data race on ARM weak memory model.
723+
// Without it: Thread1 may publish ptr before DynamicBloom::data_ is
724+
// initialized, Thread2 may see non-null ptr but access uninitialized
725+
// data_ (crash). With release-acquire: store(release) ensures all prior
726+
// writes are visible, load(acquire) ensures we see them, guaranteeing
727+
// fully initialized object.
728+
// Performance: This load() executes on every read/write call (hot path).
729+
// On modern CPUs, acquire load has minimal overhead vs relaxed: typically
730+
// 0-2 cycles on x86 (both compile to same instruction), and ~1-3 cycles
731+
// on ARM (may add a memory barrier). This overhead is negligible compared
732+
// to subsequent bloom filter operations (MayContain/Add) which involve
733+
// hash computation and memory access, making the acquire cost <1% of
734+
// total latency.
735+
auto ptr = bloom_filter_ptr_.load(std::memory_order_acquire);
723736
if (UNLIKELY(ptr == nullptr)) {
724737
std::lock_guard<SpinMutex> guard(bloom_filter_mutex_);
725738
if (bloom_filter_ == nullptr) {
@@ -729,7 +742,7 @@ class MemTable {
729742
moptions_.memtable_huge_page_size, logger_));
730743
}
731744
ptr = bloom_filter_.get();
732-
bloom_filter_ptr_.store(ptr, std::memory_order_relaxed);
745+
bloom_filter_ptr_.store(ptr, std::memory_order_release);
733746
}
734747
return ptr;
735748
}

0 commit comments

Comments
 (0)