Releasing GIL during CUDA stream synchronization

Dask tasks can run for a long time and may hold the Python GIL, which interferes with background Dask operations such as the scheduler heartbeat that reports worker liveness. Calling native code from a `nogil` block (for example in Cython) avoids holding the GIL, but many cuML estimators still require a CUDA stream synchronization after the main call to ensure kernels and memory copies have completed. Currently that synchronization is performed with a `handle.sync()` Python call, which cannot be executed inside a `nogil` block. As a result, the synchronization may acquire the GIL long enough to exceed Dask’s heartbeat timeout and make the worker appear unresponsive.

The possible solutions are the following :
- Updating [the sync method](https://github.com/rapidsai/raft/blob/ed74377cec32dd302ce9bac7cf5418036c0c3c5a/python/pylibraft/pylibraft/common/handle.pyx#L120) to sync within a nogil code block
- Making sure that CUDA stream syncing is accessible from Cython code allowing it to directly follow the native function call inside of the nogil code block in cuML
- Keeping the discipline of always syncing the CUDA stream before returning inside of cuML native functions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Releasing GIL during CUDA stream synchronization #2841

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Releasing GIL during CUDA stream synchronization #2841

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions