Skip to content

Commit b943a10

Browse files
committed
[0035] Clarify uniformity and synchronization
This adds some clarifications about uniformity and implicit synchronization of execution. It also explicitly states that _explicit_ synchronization may be required around `Load` and `Store` operations.
1 parent fb1769e commit b943a10

File tree

1 file changed

+14
-3
lines changed

1 file changed

+14
-3
lines changed

proposals/0035-linalg-matrix.md

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -393,9 +393,14 @@ There are three supported matrix scopes: `Thread`, `Wave`, and `ThreadGroup`.
393393

394394
Operations are categorized by their scope requirements. Some operations require
395395
uniform scope matrices (`Wave` or`ThreadGroup`), while others can operate on
396-
non-uniform (`Thread`) scope matrices. Operations that support non-uniform
397-
scope also support uniform scopes. There may be significant performance
398-
benefits when using uniform scope matrices.
396+
non-uniform (`Thread`) scope matrices. Operations must be called from HLSL under
397+
control flow that is _at least_ as uniform as the matrix scope. `Thread`-scope
398+
may be called in non-uniform control flow, `Wave`-scope operations must be
399+
called in `Wave`-uniform control flow, and `ThreadGroup`-scope operations must
400+
be called in `ThreadGroup`-uniform control flow. Operations implicitly
401+
synchronize execution across all threads in the matrix's scope. Calling an
402+
operation from control flow that is not uniform across all participating threads
403+
is undefined behavior.
399404

400405
When using `ThreadGroup` scope matrices, explicit barriers are required only when
401406
there are actual cross-thread dependencies, such as when multiple threads
@@ -703,6 +708,9 @@ represents the row or column stride in bytes. For the `Load` operations on
703708
`groupshared` arrays, the `Stride` argument is the count of elements in the
704709
`groupshared` array.
705710
711+
Reads from memory through `Load` functions are not atomic and may require
712+
explicit synchronization.
713+
706714
#### Matrix::Length
707715
708716
```c++
@@ -801,6 +809,9 @@ represents the row or column stride in bytes. For the `Store` operations on
801809
`groupshared` arrays, the `Stride` argument is the count of elements in the
802810
`groupshared` array.
803811

812+
Writes to memory through `Store` functions are not atomic and may require
813+
explicit synchronization.
814+
804815
#### Matrix::InterlockedAccumulate
805816

806817
```c++

0 commit comments

Comments
 (0)