P2P tasks log compute failures even if they are later restarted

# Problem

There is a race condition in P2P that causes tasks to log compute failures on the worker even though those tasks will get restarted later on and then succeed. This happens when:

1. A worker involved in the P2P operation is removed
2. We restart the P2P operation on the scheduler and schedule the messages to be sent to the workers
3. A task on worker A is not cancelled yet, but its RPC calls fail because the remote worker B has already closed the shuffle run, throwing a `P2PConsistencyError`
4. The task raises the `P2PConsistencyError` and fails while still seen as `executing` by worker A, which causes the error to get logged.

# Solution

Instead of failing directly on a `P2PConsistencyError`, the task could double-check with the scheduler whether its shuffle run is still supposed to be active. If not, it could instead silently succeed as the result will get rejected by the scheduler as outdated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P2P tasks log compute failures even if they are later restarted #8679

Problem

Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

P2P tasks log compute failures even if they are later restarted #8679

Description

Problem

Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions