"RuntimeError: cuMemMap is used in env without NVLS support (mscclpp failure: InvalidUsage)" Error

Hello! I'm doing fine-tuning through TorchTitan with MSCCLPP on Perlmutter A100 GPUs and I'm getting the shortened version of the error below. 

The commands I'm running on a node:
.../torchtitan $ export MSCCLPP_LIB=$HOME/project/mscclpp/build/lib/libmscclpp_nccl.so
.../torchtitan $ export LD_PRELOAD=$MSCCLPP_LIB
.../torchtitan $ NGPU=4 CONFIG_FILE="../trace_gen/deepseek-workload-card.toml" ./run_train.sh

I tried setting MSCCLPP_FORCE_DISABLE_NVLS to true (since Perlmutter does not have NVLS) before re-running the workload, but I still get the same error.

Error:

[rank0]:[titan] 2026-04-16 19:00:24,722 - root - INFO - Profiling active. Traces will be saved at ./outputs/profile_trace
[rank0]:[rank0]: Traceback (most recent call last):
[rank0]:[rank0]:   File "<frozen runpy>", line 198, in _run_module_as_main
[rank0]:[rank0]:   File "<frozen runpy>", line 88, in _run_code
[rank0]:[rank0]:   File "/global/u2/j/user/project/torchtitan-opus/torchtitan/train.py", line 682, in <module>
[rank0]:[rank0]:     trainer.train()
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 362, in wrapper
[rank0]:[rank0]:     return f(*args, **kwargs)
[rank0]:[rank0]:            ^^^^^^^^^^^^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/torchtitan-opus/torchtitan/train.py", line 608, in train
[rank0]:[rank0]:     self.train_step(data_iterator)
[rank0]:[rank0]:   File "/global/u2/j/user/project/torchtitan-opus/torchtitan/train.py", line 508, in train_step
[rank0]:[rank0]:     loss = self.forward_backward_step(input_dict, labels)
[rank0]:[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/torchtitan-opus/torchtitan/train.py", line 484, in forward_backward_step
[rank0]:[rank0]:     pred = model_parts[0](inputs)
[rank0]:[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
[rank0]:[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1882, in _call_impl
[rank0]:[rank0]:     return inner()
[rank0]:[rank0]:            ^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1830, in inner
[rank0]:[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/torchtitan-opus/torchtitan/models/deepseek_v3/model/model.py", line 386, in forward
[rank0]:[rank0]:     h = self.tok_embeddings(tokens) if self.tok_embeddings is not None else tokens
[rank0]:[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
[rank0]:[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1882, in _call_impl
[rank0]:[rank0]:     return inner()
[rank0]:[rank0]:            ^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1809, in inner
[rank0]:[rank0]:     args_kwargs_result = hook(self, args, kwargs)  # type: ignore[misc]
[rank0]:[rank0]:                          ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/distributed/fsdp/_fully_shard/_fsdp_state.py", line 62, in fsdp_hook_wrapper
[rank0]:[rank0]:     return torch._dynamo.disable(
[rank0]:[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 1181, in _fn
[rank0]:[rank0]:     return fn(*args, **kwargs)
[rank0]:[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/distributed/fsdp/_fully_shard/_fsdp_state.py", line 253, in _pre_forward
[rank0]:[rank0]:     args, kwargs = self._fsdp_param_group.pre_forward(module, args, kwargs)
[rank0]:[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/distributed/fsdp/_fully_shard/_fsdp_param_group.py", line 448, in pre_forward
[rank0]:[rank0]:     self.unshard(self.unshard_async_op)
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/distributed/fsdp/_fully_shard/_fsdp_param_group.py", line 338, in unshard
[rank0]:[rank0]:     self._all_gather_result = foreach_all_gather(
[rank0]:[rank0]:                               ^^^^^^^^^^^^^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank0]:[rank0]:     return func(*args, **kwargs)
[rank0]:[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/distributed/fsdp/_fully_shard/_fsdp_collectives.py", line 275, in foreach_all_gather
[rank0]:[rank0]:     all_gather_work = all_gather_comm(
[rank0]:[rank0]:                       ^^^^^^^^^^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/distributed/fsdp/_fully_shard/_fsdp_collectives.py", line 89, in __call__
[rank0]:[rank0]:     return dist.all_gather_into_tensor(
[rank0]:[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/distributed/c10d_logger.py", line 83, in wrapper
[rank0]:[rank0]:     return func(*args, **kwargs)
[rank0]:[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:[rank0]:   File "/global/u2/j/user/project/venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 4125, in all_gather_into_tensor
[rank0]:[rank0]:     work = group._allgather_base(output_tensor, input_tensor, opts)
[rank0]:[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:[rank0]: RuntimeError: cuMemMap is used in env without NVLS support (mscclpp failure: InvalidUsage)
[rank0]:[rank0]:[W416 19:00:57.904349038 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:terminate called without an active exception
W0416 19:00:57.727000 1704011 torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 1704093 closing signal SIGTERM
W0416 19:00:57.744000 1704011 torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 1704094 closing signal SIGTERM
W0416 19:00:57.749000 1704011 torch/distributed/elastic/multiprocessing/api.py:1010] Sending process 1704095 closing signal SIGTERM



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"RuntimeError: cuMemMap is used in env without NVLS support (mscclpp failure: InvalidUsage)" Error #788

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

"RuntimeError: cuMemMap is used in env without NVLS support (mscclpp failure: InvalidUsage)" Error #788

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions