-
Notifications
You must be signed in to change notification settings - Fork 221
Open
Labels
feature requestNew feature or requestNew feature or request
Description
Dask tasks can run for a long time and may hold the Python GIL, which interferes with background Dask operations such as the scheduler heartbeat that reports worker liveness. Calling native code from a nogil block (for example in Cython) avoids holding the GIL, but many cuML estimators still require a CUDA stream synchronization after the main call to ensure kernels and memory copies have completed. Currently that synchronization is performed with a handle.sync() Python call, which cannot be executed inside a nogil block. As a result, the synchronization may acquire the GIL long enough to exceed Dask’s heartbeat timeout and make the worker appear unresponsive.
The possible solutions are the following :
- Updating the sync method to sync within a nogil code block
- Making sure that CUDA stream syncing is accessible from Cython code allowing it to directly follow the native function call inside of the nogil code block in cuML
- Keeping the discipline of always syncing the CUDA stream before returning inside of cuML native functions
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request
Type
Projects
Status
Todo