Skip to content

Unable to load scheduler dashboard in SLURMRunner, but can in cluster #682

Open
@gilmorethomas

Description

@gilmorethomas

Describe the issue:
Thanks for your time in advance. I have created a simple "hello world" example of a SLURMRunner and SLURMCluster in my environment. I like the interface for the SLURMRunner instead of effectively needing to create wrappers around jobs in the SLURMCluster construct.

I dispatch my SLURMCluster job via sbatch (since my login node cannot run my scheduler) to a worker node (node-01), and then this dispatches additional jobs on my worker nodes (node[01-06]). When I do this, I am able to visit the scheduler dashboard, although I am seeing slightly weird behavior in job allocation (not the point of this post, I need to look into this more).

When I create my SLURMRunner (same as this example https://jobqueue.dask.org/en/stable/runners-overview.html), my jobs are getting allocated and run, but I am unable to load the scheduler dashboard. I get a 404 Page Not Found when I visit the scheduler link output by the client.dashboard_link and also in the scheduler.json file. This is not the same as when the runner spins down, as in this case I get the Connection Refused. Is this expected?

Minimal Complete Verifiable Example:
Using the SLURMRunner in my multi-node environment

# Put your MCVE code here

Anything else we need to know?:

Environment:

  • Dask version: 2023.6.0
  • Python version: 3.11.4
  • Operating System: CENTOS-7
  • Install method: pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs infoIf more info has been requested from the author, apply this label.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions