Issues with tasks completing on workers after being released and re-submitted

https://github.com/dask/distributed/pull/7348 fixed a deadlock where a task gets released by the client, then re-submitted by a client, but before the worker hears about the task being cancelled, it completes it and tells the scheduler.

This fix was specific to the `queued` state, but I'd think that the problem (and possible deadlocks) are broader than that. There's more fundamentally a consistency issue whenever:
1. client cancels a task
2. scheduler forgets about the task immediately _before_ confirming that workers have also forgotten about the task
3. the same task key is submitted again

Some theoretical possibilities (I haven't come up with tests to create them yet, but I imagine they're slight variations of the test added in https://github.com/dask/distributed/pull/7348):
1. The task has resource restrictions, so it's in `no-worker` instead of `queued`. Maybe this causes a deadlock?
2. The task is `processing` instead of `queued`. When re-submitted, the scheduler picks a different worker for it. Then, the message from the original worker that the task is finished arrives. This would trigger a [`RuntimeError` on the scheduler](https://github.com/dask/distributed/blob/7b1e26dc7fdbb5227aedb90a8dd6ed526933c162/distributed/scheduler.py#L2294-L2300) from the task completing on an unexpected worker.
3. The task and its dependencies were cancelled, then resubmitted, so the task is now `waiting`. Then the task completed message arrives, triggering the `waiting->memory` transition @crusaderky just added in https://github.com/dask/distributed/pull/7205 (the docstring of `transition_waiting_memory` does not mention this case FWIW, so I don't think this is intended, though the behavior would be okay)
4. After being cancelled and resubmitted, the task ran on a new worker and erred. Then, the task finished message from the original worker arrives. This would trigger an impossible `erred->memory` transition.

-------

A couple ways to address this:

1. Just don't let the scheduler eagerly forget about tasks when a client asks it to; instead, wait for confirmation from workers. `client_releases_keys` would send a `free-keys` message to workers, and only forget the task once the _workers_ confirm they've forgotten it. (We can transition the task to `released` immediately, just not `forgotten`.) This ensures the scheduler maintains a consistent view of the workers' state.
2. On each TaskState, store the value of `transition_counter` when it was created, to use for disambiguating re-submitted tasks. Send this to workers as part of the task spec. Whenever workers send messages regarding a key, they also include this `transition_counter` value. If the value doesn't match what the scheduler has on record for that task (i.e. it's been forgotten and re-submitted in the interim), we know the message is stale, and we can ignore it and just tell the worker to release that key. (We'd have to be careful for consistency issues though—we don't want to accidentally tell the worker to release a newer version of the key, if it has that too.)

cc @fjetter @crusaderky @hendrikmakait

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Issues with tasks completing on workers after being released and re-submitted #7356

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Issues with tasks completing on workers after being released and re-submitted #7356

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions