Skip to content

Conversation

@Baxi-codes
Copy link

As Himanshu pointed out:

There seems to be a problem with variant 4. The memory allocation and copying is done by the main file and only the device pointers are provided to the kernel launchers where we have to put our code. Without exposure to how the data is copied it is impossible to use streams to overlap copying and computation as the data is copied synchronously.

To enable the use of streams to overlap copying and computation, for variant4, the host pointers should be passed instead of the device pointers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant