-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
I used srun
to hop into a bash shell on a GPU machine. Then I wanted to use srun
.to launch 4 processes on this same machine. It just hangs. Looks like srun
reserves the whole node, and then further calls to srun
are stuck. So this is a use case for salloc
, and that's how I do stuff on Frontier/Perlmutter. However, on those systems salloc will jump into the machine also. That's convenient.
Lines 63 to 78 in 4a2dc09
2. Allocate compute nodes through SLURM. Do NOT directly SSH to a | |
compute node: | |
* Do this: `srun -N 1 -n 1 -c 40 -p gpu --pty bash --login` | |
* Don't do this: `ssh g0001` | |
If for some reason you need SSH, then allocate the node through | |
`salloc` before you SSH to it: | |
``` | |
salloc -n 1 -N 1 -c 40 -p gpu --exclusive | |
ssh $SLURM_NODELIST | |
``` | |
Be sure to close out your session when you are done with it so | |
that the nodes are returned to the queue. |
Metadata
Metadata
Assignees
Labels
No labels