Description
When the adaptive core gets a target value that recommends a scale-down, it always appears to take the first worker, as defined here:
This is because requested
and observed
contain completely different data.
requested
takes the name of the worker, which is indicated by the clusters as an incremental integer.
distributed/distributed/deploy/spec.py
Line 550 in 716d526
However, the observed
names, or the names the scheduler gets from the workers, appear to be the addresses of the workers.
This results in a mismatch as the sets are compared, and as there is no overlap, the adaptive core assumes that it is still awaiting some workers, and thus can kill the not-yet-arrived workers. This is counterproductive, as this causes the adaptive algorithm to kill based on ordering, rather than idle behaviour.
Environment:
- Dask version: 2023.3.0
- Python version: 3.11
- Operating System: Ubuntu 22.04 (docker)
- Install method (conda, pip, source): pip