Skip to content

Use salloc if you want to srun while on a node #1

@suranap

Description

@suranap

I used srun to hop into a bash shell on a GPU machine. Then I wanted to use srun.to launch 4 processes on this same machine. It just hangs. Looks like srun reserves the whole node, and then further calls to srun are stuck. So this is a use case for salloc, and that's how I do stuff on Frontier/Perlmutter. However, on those systems salloc will jump into the machine also. That's convenient.

sapling-guide/README.md

Lines 63 to 78 in 4a2dc09

2. Allocate compute nodes through SLURM. Do NOT directly SSH to a
compute node:
* Do this: `srun -N 1 -n 1 -c 40 -p gpu --pty bash --login`
* Don't do this: `ssh g0001`
If for some reason you need SSH, then allocate the node through
`salloc` before you SSH to it:
```
salloc -n 1 -N 1 -c 40 -p gpu --exclusive
ssh $SLURM_NODELIST
```
Be sure to close out your session when you are done with it so
that the nodes are returned to the queue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions