Skip to content

Scheduler should not be considered idle while a client submits new work #8876

Open
@hendrikmakait

Description

@hendrikmakait

Describe the issue:

I have seen several instances where a cluster with an idle timeout shut down because it took an excessive amount of time for the client to submit new work. In these cases, the scheduler should not have shut down because but rather anticipated that new work will arrive shortly.

As far as I can tell, we can address this in two steps:

  1. We should not consider the scheduler idle while Scheduler.update_graph executes. This method is the main entry point for submitting new work to the cluster and it can take a while when encountering large or complex task graphs, resulting in a cluster shutting down while the scheduler is already preparing future work.
  2. We should not consider the scheduler idle while a client submits new work. This is more complex. One possible solution would be for the client to announce to the scheduler that it starts submitting work. The scheduler will then have to ensure that it doesn't block being idle longer than necessary, i.e., handling submission attempts and client timeouts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprove existing functionality or make things work bettergood expert issueClearly described but requires someone extremely familiar with the project to implement successfullyscheduler

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions