Synchronization in union-split code paths

@tkf recently noticed / ran into a deadlock due to synchronization from branches that ended up divergent because of the compiler performing union splitting. To remedy this, we could use `barrier.sync` instead of `barrier.sync.aligned` (aka. `bar.sync` aka. `syncthreads`):

> Instruction barrier has optional .aligned modifier. When specified, it indicates that all threads in CTA will execute the same barrier instruction. In conditionally executed code, an aligned barrier instruction should only be used if it is known that all threads in CTA evaluate the condition identically, otherwise behavior is undefined.

I'm not sure it's safe to just have `syncthreads` use an unaligned barrier, because we might then be inadvertently matching unrelated barriers:

```julia
if (foo)
  sync_threads() # thread 1 waits here
sync_threads()   # thread 2 waits here
```

Maybe that only matters for already broken code though, since thread 1 would here just get stuck at the second barrier?

Also interesting: at the PTX level, barriers are identified by a 'name' (int in 0:16) and a thread mask, so we could expose a more fine-grained `sync` for use in divergent branches. Maybe we could use this to differentiate barriers, but then we'd need some sort of lexical information to assign a barrier name, and I'd rather not move to a `@sync_threads` macro.

cc @vchuravy 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Synchronization in union-split code paths #797

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Synchronization in union-split code paths #797

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions