You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
X-link: facebookresearch/FBGEMM#1281
**Context:**
While we expose KVTensor to external surfaces (i.e., checkpointing), they have the freedom to leverage the KVTensor functions in a concurrent fashion.
For example,
https://www.internalfb.com/code/fbsource/[5b7b1eef7d69]/fbcode/aiplatform/modelstore/checkpointing/pyper/TensorLoaderCallback.h?lines=85-86
This function here calls set_range to the same KVTensor multiple times because we divide a huge chunk of data into smaller chunks and try to write it in a concurrent fashion. This is a bad practice because in SSD I/O, We also use multi threading to write data in KVTensor.
Currently, we use 32 threads (each thread per shard) to write data. Due to this, when we call set_range multiple times, this can lead to thread contention and increase in synchronization overhead
**In this Diff:**
We introduce a mutex lock on the set_range function, due to this every transaction is locked during execution and the multiple calls are processed serially leading to more efficient use of the threads
Differential Revision: D75555658
0 commit comments