Description
Proposal
I've asked in the forum, and searched for a way to achieve this, but couldn't : it'd be useful to be able to suspend allocation rescheduling for a job (and then be able to mark it for re-schedule again later).
Use-cases
My main use case is this : suppose you have a 3 (or more) node service for high availability (common examples would be a postgres replica, an ES cluster, a MongoDB replica set etc.), each allocation with its own CSI volume (per_alloc=true). Now, I need to do some maintenance task on the storage (for example, migrate the data to a new volume). I would like to do it one allocation at a time, so the service stays up the whole time. For the last alloc it's easy : I can scale the group down, do my maintenance, then scale it up again. But for the first two alloc, there's no way. If I stop alloc 0, it'll be immediatly re-scheduled. So you have to take the service down.
Attempted Solutions
I tried to run a maintenance job with an access to the volume of alloc-0 of such a job : as expected, the job is waiting for a claim on the volume. Now, I stop alloc-0 of the job, hopping my maintenance job would acquire the claim, and block the real alloc-0 to be started again. But it's not working : the "real" alloc-0 always wins the race and get the claim on the volume.
Metadata
Metadata
Assignees
Type
Projects
Status