Skip to content

Implement sync_threads using an unaligned barrier. #798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion src/device/intrinsics/synchronization.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,15 @@ export threadfence, threadfence_block, threadfence_system
Waits until all threads in the thread block have reached this point and all global and
shared memory accesses made by these threads prior to `sync_threads()` are visible to all
threads in the block.

!!! note

CUDA.jl's behavior slightly differs from CUDA C here: This barrier is allowed to be
resolved from divergent branches, i.e., the barrier is not aligned and does not require
all threads in the block to execute the same barrier instruction. This is necessary
because the Julia compiler can introduce divergent branches, e.g., when union-splitting.
"""
@inline sync_threads() = ccall("llvm.nvvm.barrier0", llvmcall, Cvoid, ())
@inline sync_threads(id::Int=0) = ccall("llvm.nvvm.barrier.sync", llvmcall, Cvoid, (Int32,), id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can just add sync_threads(id::Int) and keep the old sync_threads()? People can then start using sync_threads(0) to see if it works well in practice.

Also, I don't mind just vendoring this definition and experimenting with it. (If it's only me complaining this thing.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also do an aligned barrier with a name. So it might be confusing that the named one is unaligned, while the unnamed one is still aligned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tkf Did you end up vendoring this definition then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hasn't come up as a problem and I had no chance to play with it.


"""
sync_threads_count(predicate::Int32)
Expand Down