Fix data race in MemTable::GetBloomFilter() on ARM weak memory model

hhwyt · hhwyt · commit 3d604a55c2c8 · 2025-11-17T15:11:30.000+08:00
Use release-acquire memory ordering instead of relaxed to ensure
DynamicBloom object is fully initialized before pointer publication.

Problem:
- On ARM (weak memory model), using relaxed ordering allows the pointer
  to be published before DynamicBloom::data_ is initialized
- Readers may see non-null pointer but access uninitialized data_,
  causing crashes

Solution:
- Use memory_order_acquire for load and memory_order_release for store
- This establishes synchronizes-with relationship, guaranteeing readers
  only see fully initialized objects

Performance impact:
- The acquire load executes on every call (hot path in Get/Add operations)
- Overhead is minimal: typically 0-2 cycles on x86 (same as relaxed),
  ~1-3 cycles on ARM (may add memory barrier)
- This overhead is &lt;1% of total latency compared to subsequent bloom
  filter operations (hash computation + memory access)

Signed-off-by: hhwyt &lt;hhwyt1@gmail.com&gt;
diff --git a/db/memtable.h b/db/memtable.h
@@ -719,7 +719,20 @@ class MemTable {
 
   inline DynamicBloom* GetBloomFilter() {
     if (needs_bloom_filter_) {
-      auto ptr = bloom_filter_ptr_.load(std::memory_order_relaxed);
+      // Uses release-acquire to prevent data race on ARM weak memory model.
+      // Without it: Thread1 may publish ptr before DynamicBloom::data_ is
+      // initialized, Thread2 may see non-null ptr but access uninitialized
+      // data_ (crash). With release-acquire: store(release) ensures all prior
+      // writes are visible, load(acquire) ensures we see them, guaranteeing
+      // fully initialized object.
+      // Performance: This load() executes on every read/write call (hot path).
+      // On modern CPUs, acquire load has minimal overhead vs relaxed: typically
+      // 0-2 cycles on x86 (both compile to same instruction), and ~1-3 cycles
+      // on ARM (may add a memory barrier). This overhead is negligible compared
+      // to subsequent bloom filter operations (MayContain/Add) which involve
+      // hash computation and memory access, making the acquire cost <1% of
+      // total latency.
+      auto ptr = bloom_filter_ptr_.load(std::memory_order_acquire);
       if (UNLIKELY(ptr == nullptr)) {
         std::lock_guard<SpinMutex> guard(bloom_filter_mutex_);
         if (bloom_filter_ == nullptr) {
@@ -729,7 +742,7 @@ class MemTable {
                                moptions_.memtable_huge_page_size, logger_));
         }
         ptr = bloom_filter_.get();
-        bloom_filter_ptr_.store(ptr, std::memory_order_relaxed);
+        bloom_filter_ptr_.store(ptr, std::memory_order_release);
       }
       return ptr;
     }