This project is neither sponsored nor supported by NVIDIA.
Use of NVIDIA NVSHMEM is governed by the terms at NVSHMEM Software License Agreement.
Hardware requirements:
- GPUs inside one node needs to be connected by NVLink
- GPUs across different nodes needs to be connected by RDMA devices, see GPUDirect RDMA Documentation
- InfiniBand GPUDirect Async (IBGDA) support, see IBGDA Overview
- For more detailed requirements, see NVSHMEM Hardware Specifications
Software requirements:
- NVSHMEM v3.3.9 or later
NVSHMEM 3.3.9 binaries are available in several formats:
- Tarballs for x86_64 and aarch64
- RPM and deb packages: instructions can be found on the NVHSMEM installer page
- Conda packages through conda-forge
- pip wheels through PyPI:
pip install nvidia-nvshmem-cu12DeepEP is compatible with upstream NVSHMEM 3.3.9 and later.
NVSHMEM Supports two modes with different requirements. Either of the following methods can be used to enable IBGDA support.
This configuration enables traditional IBGDA support.
Modify /etc/modprobe.d/nvidia.conf:
options nvidia NVreg_EnableStreamMemOPs=1 NVreg_RegistryDwords="PeerMappingOverride=1;"Update kernel configuration:
sudo update-initramfs -u
sudo rebootThis configuration enables IBGDA through asynchronous post-send operations assisted by the CPU. More information about CPU-assisted IBGDA can be found in this blog. It comes with a small performance penalty, but can be used when modifying the driver regkeys is not an option.
Download GDRCopy GDRCopy is available as prebuilt deb and rpm packages here. or as source code on the GDRCopy github repository.
Install GDRCopy following the instructions on the GDRCopy github repository.
When not installing NVSHMEM from RPM or deb packages, set the following environment variables in your shell configuration:
export NVSHMEM_DIR=/path/to/your/dir/to/install # Use for DeepEP installation
export LD_LIBRARY_PATH="${NVSHMEM_DIR}/lib:$LD_LIBRARY_PATH"
export PATH="${NVSHMEM_DIR}/bin:$PATH"nvshmem-info -a # Should display details of nvshmem