Out of memory issue on Polaris due to CUDA pinned memory on Polaris

**Describe the bug**
Runs stopped with the following error
```
cudaAssert: cudaErrorMemoryAllocation out of memory, file /home/yeluo/opt/qmcpack/src/Platforms/CUDA/MemManageCUDA.hpp, line 74
```
when calling cudaHostRegister. However, host memory usage is way below available DDR capacity.

**To Reproduce**
Steps to reproduce the behavior:
1. all code release with DiracDeterminantBatched
2. NiO performance benchmark a64 with 2048 walkers per rank.
3. Running 2-4 ranks per node fails. 1 MPI rank works
4. Each MPI rank sees all 4 GPUs.

**Expected behavior**
The simulation should run with 1-4 ranks.

**System:**
ALCF Polaris

**Additional context**
I injected counters to count the peak amount of registered host memory (pinned memory) segments.
1 MPI rank run completed with max at ~34k
2 MPI rank run hit error at max ~32k per rank
3 MPI rank run hit error at max ~21k per rank
4 MPI rank run hit error at max ~16k per rank

a. There is a cap at around 65536 magic number. My guess `vm.max_map_count=65530`
b. It seems MPI (Cray MPICH) related. Likely due to the notorious XPMEM.
c. [workaround](https://docs.alcf.anl.gov/polaris/running-jobs/using-gpus/#binding-mpi-ranks-to-gpus) exposing one GPU per rank made all cases to run.

Long term solution from our side. We need to to bulk allocation/registration and views instead of doing that per walker.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory issue on Polaris due to CUDA pinned memory on Polaris #5557

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Out of memory issue on Polaris due to CUDA pinned memory on Polaris #5557

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions