Adding a mutex lock to set_range function

Raahul Kalyaan Jakka · facebook-github-bot · commit c9678483cc48 · 2025-05-29T09:45:28.000-07:00
Summary: X-link: facebookresearch/FBGEMM#1281 **Context:** While we expose KVTensor to external surfaces (i.e., checkpointing), they have the freedom to leverage the KVTensor functions in a concurrent fashion. For example, https://www.internalfb.com/code/fbsource/[5b7b1eef7d69]/fbcode/aiplatform/modelstore/checkpointing/pyper/TensorLoaderCallback.h?lines=85-86 This function here calls set_range to the same KVTensor multiple times because we divide a huge chunk of data into smaller chunks and try to write it in a concurrent fashion. This is a bad practice because in SSD I/O, We also use multi threading to write data in KVTensor. Currently, we use 32 threads (each thread per shard) to write data. Due to this, when we call set_range multiple times, this can lead to thread contention and increase in synchronization overhead **In this Diff:** We introduce a mutex lock on the set_range function, due to this every transaction is locked during execution and the multiple calls are processed serially leading to more efficient use of the threads Differential Revision: D75555658
diff --git a/fbgemm_gpu/src/ssd_split_embeddings_cache/kv_tensor_wrapper.h b/fbgemm_gpu/src/ssd_split_embeddings_cache/kv_tensor_wrapper.h
@@ -99,6 +99,7 @@ class KVTensorWrapper : public torch::jit::CustomClassHolder {
   int64_t row_offset_;
   std::optional<at::Tensor> sorted_indices_ = std::nullopt;
   int64_t width_offset_;
+  std::mutex mtx;
 };
 
 } // namespace ssd
diff --git a/fbgemm_gpu/src/ssd_split_embeddings_cache/ssd_split_table_batched_embeddings.cpp b/fbgemm_gpu/src/ssd_split_embeddings_cache/ssd_split_table_batched_embeddings.cpp
@@ -12,7 +12,7 @@
 #include <torch/library.h>
 
 #include <torch/custom_class.h>
-
+#include <mutex>
 #include "../dram_kv_embedding_cache/dram_kv_embedding_cache_wrapper.h"
 #include "./ssd_table_batched_embeddings.h"
 #include "embedding_rocksdb_wrapper.h"
@@ -377,6 +377,8 @@ void KVTensorWrapper::set_range(
     const int64_t start,
     const int64_t length,
     const at::Tensor& weights) {
+  // Mutex lock for disabling concurrent writes to the same KVTensor
+  std::lock_guard<std::mutex> lock(mtx);
   CHECK_EQ(dim, 0) << "Only set_range on dim 0 is supported";
   CHECK_TRUE(db_ != nullptr);
   CHECK_GE(db_->get_max_D(), shape_[1]);