Skip to content

Workers do not acknowledge cancel request #465

@gxuu

Description

@gxuu

The worker, when shutdown requested through signal handle, is sending a DisconnectRequest. It should wait for DisconnectResponse afterwards. This was never done. Current version ( main branch ) exits immediately after sending DisconnectRequest.

To prove this, we have this in the scheduler side,

    async def on_disconnect(self, worker_id: WorkerID, request: DisconnectRequest):
        await self.__disconnect_worker(request.worker)
        await self._binder.send(worker_id, DisconnectResponse.new_msg(request.worker)) # <-- HERE

And the worker currently doesn't check DisconnectResponse message type. Because of race condition, the worker almost always exits before checking the last message from the scheduler, thus no exception was thrown.

There's a more detailed explanation in #430. This issue is also fixed by that PR. Please check accordingly.

Metadata

Metadata

Assignees

Labels

pythonPull requests that update python code

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions