Skip to content

Releasing GIL during CUDA stream synchronization #2841

@viclafargue

Description

@viclafargue

Dask tasks can run for a long time and may hold the Python GIL, which interferes with background Dask operations such as the scheduler heartbeat that reports worker liveness. Calling native code from a nogil block (for example in Cython) avoids holding the GIL, but many cuML estimators still require a CUDA stream synchronization after the main call to ensure kernels and memory copies have completed. Currently that synchronization is performed with a handle.sync() Python call, which cannot be executed inside a nogil block. As a result, the synchronization may acquire the GIL long enough to exceed Dask’s heartbeat timeout and make the worker appear unresponsive.

The possible solutions are the following :

  • Updating the sync method to sync within a nogil code block
  • Making sure that CUDA stream syncing is accessible from Cython code allowing it to directly follow the native function call inside of the nogil code block in cuML
  • Keeping the discipline of always syncing the CUDA stream before returning inside of cuML native functions

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions