Skip to content

Early requeue of jobs #57

@aeantipov

Description

@aeantipov

(copy-paste from a chat)
Consider a case of 100 jobs, each for 30 min.
When adaptive scheduler submits these jobs it creates a slurm job for each one.
That requires allocating a node per job.
But in fact when the jobs are not super-long by the time the 50th node becomes available the prior nodes can be empty.
So now you end up with 98 allocated nodes and 98% of calculation finished only to wait for that last node to boot up.

I was wondering if more frequent checking and requeuing jobs would be useful for adaptive-scheduler or is it too much of a hassle?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions