Skip to content

BlockingMode._fd_reader_callback asyncio task not end #1072

Closed
@luweizheng

Description

@luweizheng

Hi there,

I am the maintainer of xoscar and xorbits. xoscar is a lightweight actor programming framework that enables inter-process and inter-node communication. We use ucx-py to accelerate communication. There have been no issues before, but recently, using ucx-py has been consistently reporting the following error.

It seems that there are some asyncio tasks not end?

Exception in callback <bound method BlockingMode._fd_reader_callback of <ucp.continuous_ucx_progress.BlockingMode object at 0x71df9c35c910>>
handle: <Handle BlockingMode._fd_reader_callback>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run
  File "/home/xor/.conda/envs/xor/lib/python3.11/site-packages/ucp/continuous_ucx_progress.py", line 85, in _fd_reader_callback
    assert self.asyncio_task is None or self.asyncio_task.done()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

As this is only a assert statement, I delete this line. After commenting out this assert line, the entire program can run but will report another error.

Task was destroyed but it is pending!
task: <Task pending name='Task-102' coro=<BlockingMode._arm_worker() running at /fs/fast/share/pingtai_cc/envs/cudf/lib/python3.11/site-packages/ucp/continuous_ucx_progress.py:110> wait_for=<_SyncSocketReaderFuture pending cb=[Task.task_wakeup()]>>

In terms of performance for communication and computation across computing nodes, now using ucx-py is slightly slower than using unixsocket. Perviously, when no error like this, ucx-py is faster than unixsocket.

This part feels difficult to debug. Are there any clues to help with debugging?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions