-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Labels
enhancementNew feature or requestNew feature or request
Description
The current grCUDA prototype adds an synchronization barriers after every kernel execution (cudaDeviceSynchronize()). In CUDA, kernels are executed asynchronously with respect to host code and kernels or memory operations in other streams.
- Implement asynchronous but non-deferred execution.
- Track read and write dependencies in
DeviceArrayand automatically insert synchronization points.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request