This document describes how to integrate Slurm with MIG enabled Nvidia GPUs. Be sure to read the MIG getting started guide if you haven't already.
This guide assumes that administrators have read, understood and partitioned their Nvidia GPUs as desired to meet the needs of their users and applications (use of Nvidia's mig-parted is highly recommended). Slurm will treat MIG devices as separate and distinct GPUs enabling multiple jobs and users to utilize a single GPU without any contention.
The following steps show how to use the Mig Detection program and use a single A100 system as an example.
Build the MIG detection program with a single command. Note that Cuda and gcc need to be installed for the program to build correctly:
sudo ln -s /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so
gcc -g -o mig -I/usr/lib/x86_64-linux-gnu -I/usr/include mig.c -lnvidia-ml
If nvml.h and libnvidia-ml.so are not in standard locations the above command will need to be adjusted accordingly.
This program will detect all MIG devices and other Nvidia GPUs and create a corresponding gres.yml file in the working directory. gres.yml can be used with a Slurm Ansible role to generate a gres.conf file.
$ nvidia-smi
Thu Mar 18 08:05:25 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Graphics Device On | 00000000:65:00.0 Off | On |
| 35% 56C P0 43W / 200W | 13MiB / 48675MiB | N/A Default |
| | | Enabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
| | | ECC| |
|==================+======================+===========+=======================|
| 0 2 0 0 | 7MiB / 24192MiB | 56 0 | 4 0 2 0 0 |
| | 0MiB / 127MiB | | |
+------------------+----------------------+-----------+-----------------------+
| 0 7 0 1 | 1MiB / 5888MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 31MiB | | |
+------------------+----------------------+-----------+-----------------------+
| 0 8 0 2 | 1MiB / 5888MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 31MiB | | |
+------------------+----------------------+-----------+-----------------------+
| 0 9 0 3 | 1MiB / 5888MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 31MiB | | |
+------------------+----------------------+-----------+-----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
$ ./mig
GPU count 1
Success
$ ls
gres.yml LICENSE mig mig.c README.md
The contents of the files can be viewed and tweaked. For example, sites can change the Type attribute of the MIG devices to something more consistent with the system or change the default list of cgroup's allowed devices:
jpellman@emgsoftbuild:~/23196/slurm-mig-discovery$ head -n 18 gres.yml
# GPU 0 MIG 0 /proc/driver/nvidia/capabilities/gpu0/mig/gi3/access
- File: /dev/nvidia-caps/nvidia-cap30
Name: gpu
NodeName: semc-gpu34
Type: 1g.24gb
# GPU 0 MIG 1 /proc/driver/nvidia/capabilities/gpu0/mig/gi4/access
- File: /dev/nvidia-caps/nvidia-cap39
Name: gpu
NodeName: semc-gpu34
Type: 1g.24gb
# GPU 0 MIG 2 /proc/driver/nvidia/capabilities/gpu0/mig/gi5/access
- File: /dev/nvidia-caps/nvidia-cap48
Name: gpu
NodeName: semc-gpu34
Type: 1g.24gb
Add text in gres.yml to your Ansible configuration.
Slurm must be configured to use cgroups in order to enforce MIG device isolation across users and jobs. Ensure the following parameters are present in slurm.conf:
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup
In addition, ensure that Slurm constrains devices with the following entry in cgroup.conf:
ConstrainDevices=yes
See Slurm's cgroup.conf and cgroup documentation for more information.
Be sure to start/restart the slurmctld on the head node and the slurmd on all the MIG nodes after configuring all the devices. In addition, anytime the MIG or GPU configuration is changed repeat steps 2, 3 and 5.
With Slurm configured and started you can now verify correct operation. Check that the GPUs and MIG devices are present via "scontrol show nodes". Run some GPU jobs requesting the new MIG devices.
$ scontrol show nodes
NodeName=p1-019 Arch=x86_64 CoresPerSocket=8
CPUAlloc=0 CPUTot=16 CPULoad=0.01
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=gpu:1g.6gb:3,gpu:4g.24gb:1
NodeAddr=p1-019 NodeHostName=p1-019 Version=20.11.4
OS=Linux 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
RealMemory=1 AllocMem=0 FreeMem=48895 Sockets=1 Boards=1
State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=debug
BootTime=2021-03-15T06:14:57 SlurmdStartTime=2021-03-18T09:01:00
CfgTRES=cpu=16,mem=1M,billing=16
AllocTRES=
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Comment=(null)
$ srun --gres=gpu:4g.24gb nvidia-smi
Thu Mar 18 08:05:12 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Graphics Device On | 00000000:65:00.0 Off | On |
| 35% 56C P0 43W / 200W | 13MiB / 48675MiB | N/A Default |
| | | Enabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
| | | ECC| |
|==================+======================+===========+=======================|
| 0 2 0 0 | 7MiB / 24192MiB | 56 0 | 4 0 2 0 0 |
| | 0MiB / 127MiB | | |
+------------------+----------------------+-----------+-----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Note that only the requested MIG device is visible to the job.
$ srun --gres=gpu:1g.6gb:2 nvidia-smi
Thu Mar 18 08:07:55 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Graphics Device On | 00000000:65:00.0 Off | On |
| 35% 56C P0 43W / 200W | 13MiB / 48675MiB | N/A Default |
| | | Enabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
| | | ECC| |
|==================+======================+===========+=======================|
| 0 7 0 0 | 1MiB / 5888MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 31MiB | | |
+------------------+----------------------+-----------+-----------------------+
| 0 8 0 1 | 1MiB / 5888MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 31MiB | | |
+------------------+----------------------+-----------+-----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+