Description
Describe the issue:
Thanks for your time in advance. I have created a simple "hello world" example of a SLURMRunner and SLURMCluster in my environment. I like the interface for the SLURMRunner instead of effectively needing to create wrappers around jobs in the SLURMCluster construct.
I dispatch my SLURMCluster job via sbatch (since my login node cannot run my scheduler) to a worker node (node-01), and then this dispatches additional jobs on my worker nodes (node[01-06]). When I do this, I am able to visit the scheduler dashboard, although I am seeing slightly weird behavior in job allocation (not the point of this post, I need to look into this more).
When I create my SLURMRunner (same as this example https://jobqueue.dask.org/en/stable/runners-overview.html), my jobs are getting allocated and run, but I am unable to load the scheduler dashboard. I get a 404 Page Not Found when I visit the scheduler link output by the client.dashboard_link and also in the scheduler.json file. This is not the same as when the runner spins down, as in this case I get the Connection Refused. Is this expected?
Minimal Complete Verifiable Example:
Using the SLURMRunner in my multi-node environment
# Put your MCVE code here
Anything else we need to know?:
Environment:
- Dask version: 2023.6.0
- Python version: 3.11.4
- Operating System: CENTOS-7
- Install method: pip