Skip to content

Autoscaler scales down to 1 worker. #930

Open
@me-her

Description

@me-her

Autoscaler scales down to 1 worker despite configuring the minimum to be 2 workers:

Testing Scenario:

#Ran a big computation 

import distributed
client = await distributed.Client(" <hosted-url>:8786", asynchronous=True,direct_to_workers=True)

import dask.array as da 
array = da.random.random(size=(40960, 4096, 4096), chunks="256M").astype("float32")
mean = await client.compute(array.mean())

1st Run with min as 8 and max as 16.

2nd Run with min as 2 and max as 12

The scale up is perfect. It scales up as expected to 16 workers - 1st Run and 12 workers - 2nd Run respectively.
While scaling down, the operator logs as shown below indicate that it scales down to 2 (The minimum configured) but then I see only one worker remaining. This happened on both runs.

Operator Logs

Image

Scheduler Logs

Image

Anything else we need to know?:

I see the workers scale down to 1 even when the minimum configured was 8. Earlier version of dask-operator had a problem of staying at the maximum number of workers, as in when they scale up to max, they never scale back down. This version the scale up works perfectly as expected but the scale down happens to 1.

Environment:

  • Dask version: 2024.12.1
  • Distributed : 2024.12.1
  • Dask-Operator: 2025.1.0
  • Python version: 3.10
  • Operating System: Linux
  • Install method (conda, pip, source): pip
  • Running this on GKE

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions