Skip to content

Synchronization in union-split code paths #797

Open
@maleadt

Description

@maleadt

@tkf recently noticed / ran into a deadlock due to synchronization from branches that ended up divergent because of the compiler performing union splitting. To remedy this, we could use barrier.sync instead of barrier.sync.aligned (aka. bar.sync aka. syncthreads):

Instruction barrier has optional .aligned modifier. When specified, it indicates that all threads in CTA will execute the same barrier instruction. In conditionally executed code, an aligned barrier instruction should only be used if it is known that all threads in CTA evaluate the condition identically, otherwise behavior is undefined.

I'm not sure it's safe to just have syncthreads use an unaligned barrier, because we might then be inadvertently matching unrelated barriers:

if (foo)
  sync_threads() # thread 1 waits here
sync_threads()   # thread 2 waits here

Maybe that only matters for already broken code though, since thread 1 would here just get stuck at the second barrier?

Also interesting: at the PTX level, barriers are identified by a 'name' (int in 0:16) and a thread mask, so we could expose a more fine-grained sync for use in divergent branches. Maybe we could use this to differentiate barriers, but then we'd need some sort of lexical information to assign a barrier name, and I'd rather not move to a @sync_threads macro.

cc @vchuravy

Metadata

Metadata

Assignees

No one assigned

    Labels

    cuda kernelsStuff about writing CUDA kernels.speculativeNot sure about this one yet.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions