Set interface in `SLURMRunner`

> Hello from a new user! I'm putting this here rather than opening a new issue, but let me know if I should do the latter instead.
> 
> Following the [documentation](https://jobqueue.dask.org/en/latest/runners-overview.html), I am trying to run my very first "hello dask" script that looks like the following: 
> 
> ```python
> from dask.distributed import Client
> from dask_jobqueue.slurm import SLURMRunner
> 
> with SLURMRunner() as runner:
>     with Client(runner) as client:
>         client.wait_for_workers(runner.n_workers)
>         print(f"Number of workers = {runner.n_workers}")
> ```
> 
> When I submit the job using slurm, I get the following network-related warning
> ```
> 2025-02-12 16:22:11,565 - distributed.scheduler - INFO - State start
> /home/sm69/.conda/envs/pyathena/lib/python3.13/site-packages/distributed/utils.py:189: RuntimeWarning: Couldn't detect a suitable IP address for reaching '8.8.8.8', defaulting to hostname: [Errno 101] Network is unreachable
>   warnings.warn(
> 2025-02-12 16:22:11,569 - distributed.scheduler - INFO -   Scheduler at:  tcp://10.33.81.152:35737
> 2025-02-12 16:22:11,569 - distributed.scheduler - INFO -   dashboard at:  http://10.33.81.152:8787/status
> 2025-02-12 16:22:11,569 - distributed.scheduler - INFO - Registering Worker plugin shuffle
> 2025-02-12 16:22:11,647 - distributed.scheduler - INFO - Receive client connection: Client-6c2bbb5b-e987-11ef-b579-78ac4413ab38
> 2025-02-12 16:22:11,647 - distributed.core - INFO - Starting established connection to tcp://10.33.81.152:58686
> 2025-02-12 16:22:11,658 - distributed.worker - INFO -       Start worker at:   tcp://10.33.81.152:42115
> 2025-02-12 16:22:11,658 - distributed.worker - INFO -          Listening to:   tcp://10.33.81.152:42115
> 2025-02-12 16:22:11,658 - distributed.worker - INFO -       Start worker at:   tcp://10.33.81.152:38967
> 2025-02-12 16:22:11,658 - distributed.worker - INFO -       Start worker at:   tcp://10.33.81.152:44313
> 2025-02-12 16:22:11,658 - distributed.worker - INFO -       Start worker at:   tcp://10.33.81.152:42309
> 2025-02-12 16:22:11,658 - distributed.worker - INFO -           Worker name:                          9
> 2025-02-12 16:22:11,659 - distributed.worker - INFO -          dashboard at:         10.33.81.152:46699
> 2025-02-12 16:22:11,659 - distributed.worker - INFO - Waiting to connect to:   tcp://10.33.81.152:35737
> 2025-02-12 16:22:11,659 - distributed.worker - INFO -       Start worker at:   tcp://10.33.81.152:34517
> ...
> ```
> Followed by `StreamClosedError` and `CommClosedError`
> 
> Before get into the Runner, I have already tried using Cluster, by, e.g.,
> ```python
> ncores = 96
> SLURMCluster(cores=ncores, memory='720 GiB', processes=ncores, interface="ib0")
> ```
> As you can see here, I had to set `interface="ib0"` (the cluster uses infiniband for inter-node communication); otherwise I got similar error.
> 
> This made me think that I have to do something similar to `interface="ib0"` when using `SLURMRunner` as well, but I couldn't find such thing in the documentation. Could you guide me what to do?
> 
> Somewhat related feedback from a new user's perspective: It was a surprise to me when I first realize `SLURMCluster` does not support multi-node job. I was not mentioned explicitly in the documentation, and I had to surf through several issues to come to realize that is the case. I think one of the main motivation to use dask is to overcome single node memory bound when analyzing large simulation data, so I naively assumed that `dask-jobqueue` would support multi-node job. It might be very helpful that documentation explicitly says that `SLURMCluster` cannot submit multi-node job. 

 _Originally posted by @sanghyukmoon in [#638](https://github.com/dask/dask-jobqueue/issues/638#issuecomment-2654930518)_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set interface in `SLURMRunner` #681

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Set interface in SLURMRunner #681

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Set interface in `SLURMRunner` #681