[None][fix] Always sync local ranks after prefetch in HfWeightLoader

lancelly · lancelly · commit 33d39f2a2f1b · 2026-04-28T02:43:50.000-07:00
`enable_prefetch` depends on `psutil.virtual_memory().available`, a per-rank
volatile value, so different local ranks may take different branches. Gating
`local_mpi_barrier()` on `enable_prefetch` could deadlock between ranks that
prefetched and ranks that skipped. Move the barrier out of the conditional so
all local ranks synchronize unconditionally; ranks that didn't prefetch reach
the barrier immediately.

Signed-off-by: Lanyu Liao &lt;lancelly@users.noreply.github.com&gt;
diff --git a/tensorrt_llm/_torch/models/checkpoints/hf/weight_loader.py b/tensorrt_llm/_torch/models/checkpoints/hf/weight_loader.py
@@ -85,8 +85,12 @@ def load_weights(self, checkpoint_dir: str,
                     f"Prefetching {prefetch_size / (1024**3):.2f}GB checkpoint files."
                 )
                 self.prefetch_files(weight_files)
-                # Ensure that all local ranks have finished prefetching before loading weights
-                local_mpi_barrier()
+            # Sync all local ranks unconditionally. `enable_prefetch` depends on
+            # `psutil.virtual_memory().available`, a per-rank volatile value, so
+            # different ranks may take different branches; gating the barrier on
+            # it would deadlock between ranks that prefetched and ranks that
+            # skipped. Ranks that didn't prefetch reach the barrier immediately.
+            local_mpi_barrier()
 
             return self._load_weights_in_parallel(
                 weight_files, self._load_safetensors_file,