-
Notifications
You must be signed in to change notification settings - Fork 108
start merging early thread exit handling #1584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
@maddyscientist As for simplifying the calls to |
|
@jcosborn for those QUDA developers not on the portability calls (who are reviewing this PR) can you describe why these changes are needed? |
|
The early thread exit handling changes are to support programming models (like SYCL) which only support block collectives when all threads in a block are active (non-exited). The changes allow targets to have all threads enter a kernel functor to participate in the block collectives when block collectives are used in a kernel. For these kernels, instead of exiting when a thread is determined to be out-of-bounds for the kernel, all threads can enter the kernel, and the out-of-bounds ones will be marked inactive with an extra argument. Only kernels that need this handling need any changes. The kernel functors that need this handling have an extra template parameter |
include/dslash_helper.cuh
Outdated
| dslash.template operator()<kernel_type>(x_cb, s, parity); | ||
| } | ||
|
|
||
| template <KernelType kernel_type, class D> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both versions of apply_dslash here? Couldn't we just template the all thread parameter and have a single function definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can push the allthreads check into this function which would simplify the call. I'll do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maddyscientist
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this PR. I've left a few remarks, ideally it would be nice to minimize code changes, but these are nice to have.
Great work @jcosborn!
This covers all files except the dslashes. Those will be merged later to keep the complexity of this PR down.