Open
Description
#7348 fixed a deadlock where a task gets released by the client, then re-submitted by a client, but before the worker hears about the task being cancelled, it completes it and tells the scheduler.
This fix was specific to the queued
state, but I'd think that the problem (and possible deadlocks) are broader than that. There's more fundamentally a consistency issue whenever:
- client cancels a task
- scheduler forgets about the task immediately before confirming that workers have also forgotten about the task
- the same task key is submitted again
Some theoretical possibilities (I haven't come up with tests to create them yet, but I imagine they're slight variations of the test added in #7348):
- The task has resource restrictions, so it's in
no-worker
instead ofqueued
. Maybe this causes a deadlock? - The task is
processing
instead ofqueued
. When re-submitted, the scheduler picks a different worker for it. Then, the message from the original worker that the task is finished arrives. This would trigger aRuntimeError
on the scheduler from the task completing on an unexpected worker. - The task and its dependencies were cancelled, then resubmitted, so the task is now
waiting
. Then the task completed message arrives, triggering thewaiting->memory
transition @crusaderky just added in Edge and impossible transitions to memory #7205 (the docstring oftransition_waiting_memory
does not mention this case FWIW, so I don't think this is intended, though the behavior would be okay) - After being cancelled and resubmitted, the task ran on a new worker and erred. Then, the task finished message from the original worker arrives. This would trigger an impossible
erred->memory
transition.
A couple ways to address this:
- Just don't let the scheduler eagerly forget about tasks when a client asks it to; instead, wait for confirmation from workers.
client_releases_keys
would send afree-keys
message to workers, and only forget the task once the workers confirm they've forgotten it. (We can transition the task toreleased
immediately, just notforgotten
.) This ensures the scheduler maintains a consistent view of the workers' state. - On each TaskState, store the value of
transition_counter
when it was created, to use for disambiguating re-submitted tasks. Send this to workers as part of the task spec. Whenever workers send messages regarding a key, they also include thistransition_counter
value. If the value doesn't match what the scheduler has on record for that task (i.e. it's been forgotten and re-submitted in the interim), we know the message is stale, and we can ignore it and just tell the worker to release that key. (We'd have to be careful for consistency issues though—we don't want to accidentally tell the worker to release a newer version of the key, if it has that too.)