Skip to content

Manually scale up adaptive clusters #3610

Open
@jsignell

Description

@jsignell

There was some discussion on gitter about adaptive clusters and whether the user should be able to request a target number of workers.

Context

The rationale for wanting this kind of behavior is that you could be in a scenario where you know you'll need a lot at the beginning of a task, so you want to manually scale up, but then let the workers kill off as they finish.

From @d-v-b: This describes most reductions (e.g., min of a big array) and I found the adaptive cluster performs terribly for these workloads.

From @lesteve:

As I mentioned during my Dask-Jobqueue talk at the Dask workshop, my feeling is that a lot of use cases for Adaptive in a HPC context are embarrassingly parallel and have essentially two phases:

  1. scale up as fast as your HPC cluster lets you
  2. once computation nears towards the end, Dask workers that don't have tasks left should be killed quickly to free up resources.

I was planning to look at trying to do that manually at one point but never got around to it. If someone makes some progress in this direction, let me know!

Also from user feed-back (not first-hand experience) it feels like Adaptive has some problems and @d-v-b confirmed my impression at the workshop.

Proposal

I was originally thinking about this from the scheduler perspective (interacted with via the client object). I was imagining that you could set target on the scheduler and that would be used to convey the intention to the cluster (similar to the adaptive_target pattern). But @martindurant pointed out that you'd need some concept of how long the workers should hang around for before adaptive starts trying to tear them down. Perhaps the right approach would be you don't start spinning up the workers until some work is submitted (similar to Adaptive), but you wait to start work until target workers (or more likely min(maximum, target)) are available.

ref #3526

Metadata

Metadata

Assignees

No one assigned

    Labels

    adaptiveAll things relating to adaptive scaling

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions