Description
What happened:
Running Dask-Scheduler on a dedicated host, with dashboard enabled and used, results in CPU utilization increase to the point where scheduler slows down to a halt.
What you expected to happen:
Using dashboard should not result in so much CPU utilization, resulting in unresponsive scheduler.
If dashboard is not intended to be used in a production environment, a warning must be added to documentation (or even the CLI) to inform the users.
Minimal Complete Verifiable Example:
- Use any recent release (at least for the past year, including main dev branch).
- Install in a venv to make sure all requirements are installed with the most recent available version
- Start
dask-scheduler
with no extra CLI parameters - Don't even need to start a worker, just scheduler by itself is enough to reproduce the problem
- Visit the dashboard, browse to multiple pages
- Observe that with each page click, memory and CPU utilization increase
Anything else we need to know?:
In a production environment, we have to constantly monitor the scheduler machine and restart dask-scheduler
due to this issue.
Not sure if it helps to know: when CPU usage goes above 50%, scheduler stops doing some optimizations: https://github.com/dask/distributed/blob/main/distributed/scheduler.py#L8048
Environment:
- Dask version: Many, including latest
2022.4.0
- Python version: Tested with 3.8.8 and 3.9.7
- Operating System: Tested with Ubuntu 18.04 and MacOS 12.3.1
- Install method (conda, pip, source): tested with both pip and source