Skip to content

Commit 53b183f

Browse files
[0029][0031]Add Matrix Layout constraint to OuterProductOptimal for OuterProductAccumulate function (#494)
* Add Matrix Layout constraint to OuterProductOptimal
1 parent e083dc3 commit 53b183f

2 files changed

Lines changed: 23 additions & 11 deletions

File tree

proposals/0029-cooperative-vector.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -222,12 +222,12 @@ The matrix is loaded from a raw-buffer, **matrix resource**, starting at
222222
**matrix offset**. The **matrix interpretation** argument specifies the element
223223
type of the matrix (see [Type Interpretations]), no conversion is performed.
224224
The **matrix M dimension** and **matrix K dimension** arguments specify the
225-
dimensions of the matrix. The**matrix layout** argument specifies the layout
225+
dimensions of the matrix. The **matrix layout** argument specifies the layout
226226
of the matrix (see [Matrix Layouts]). If the **matrix transpose** is non-zero
227227
then the matrix is transposed before performing the multiply (see
228228
[Matrix Transpose]). For row-major and column-major layouts, **matrix
229229
stride** specifies the number of bytes to go from one row/column to the next.
230-
For optimal layouts, **matrix stride** is ignored.
230+
For optimal layouts, **matrix stride** must be zero.
231231

232232
Only non-packed interpretations are valid for matrices.
233233

@@ -279,7 +279,7 @@ that the input vector is an unsigned integer.
279279
following `ComponentType`s: `I16`, `U16`, `I32`, `U32`, `F16`, `F32`, `U8`,
280280
`I8`, `F8_E4M3`, `F8_E5M2`,
281281

282-
### Vector Outer Product
282+
### Vector-Vector Outer Product and Accumulate
283283

284284
#### Syntax
285285

@@ -312,8 +312,11 @@ The two input vectors are specified via **input vector 1** and **input vector
312312

313313
The matrix is accumulated to the writeable raw-buffer specified by **matrix
314314
resource**, with **matrix offset**, **matrix interpretation**, **matrix
315-
layout**, and **matrix stride** behaving as described
316-
[above](#matrix-vector-multiply-and-multiply-add-operations).
315+
layout**, and **matrix stride** behaving as described
316+
[above](#matrix-vector-multiply-and-multiply-add-operations).
317+
318+
Note that **matrix layout** must be `DXILMatrixLayout::OuterProductOptimal` for
319+
this operation. **matrix stride** must be 0 (for optimal layouts).
317320

318321
The base address of **matrix resource** and **matrix offset** must be 128-byte
319322
aligned. Also note that the size of the underlying allocation is guaranteed to
@@ -322,8 +325,6 @@ row/column of the matrix is valid memory. Implementations may write to the
322325
contents of the padding between the end of the matrix and the 16-byte boundary,
323326
so developers should not use this padding space for anything else.
324327

325-
The **matrix stride** is 16-byte aligned.
326-
327328
Not all combinations of vector element type and matrix interpretations are
328329
supported by all implementations. [CheckFeatureSupport] can be used to
329330
determine which combinations are supported. A list of combinations that are
@@ -336,6 +337,8 @@ guaranteed to be supported on all implementations can be found in
336337
following `ComponentType`s: `I16`, `U16`, `I32`, `U32`, `F16`, `F32`, `U8`,
337338
`I8`, `F8_E4M3`, `F8_E5M2`,
338339

340+
* **matrix layout** must be `DXILMatrixLayout::OuterProductOptimal`
341+
339342

340343
### Vector Accumulate
341344

@@ -572,8 +575,9 @@ enum class DXILMatrixLayout : uint {
572575
```
573576
574577
Optimal layouts are opaque implementation specific layouts, the D3D call
575-
`ConvertLinearAlgebraMatrix` can be used to convert the *Matrix* to an
576-
optimal layout. Row-Major and Column-Major layouts are also supported.
578+
`ConvertLinearAlgebraMatrix` can be used to convert the *Matrix* to an optimal
579+
layout. Row-Major and Column-Major layouts are also supported. **matrix
580+
stride** must be zero for optimal layouts.
577581
578582
579583
### Matrix Transpose

proposals/0031-hlsl-vector-matrix-operations.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -319,7 +319,7 @@ Members:
319319
- `StartOffset` - the offset, in bytes, from the beginning of the buffer where
320320
the matrix is located.
321321
- `Stride` - the stride, in bytes, between rows or columns of the matrix. This
322-
value is ignored if the matrix layout is `MATRIX_LAYOUT_MUL_OPTIMAL` or
322+
value must be zero if the matrix layout is `MATRIX_LAYOUT_MUL_OPTIMAL` or
323323
`MATRIX_LAYOUT_OUTER_PRODUCT_OPTIMAL`.
324324

325325
Implementation:
@@ -643,7 +643,10 @@ Parameters:
643643
- `InputVector2` - the second vector, containing N elements. Element type must
644644
be the same as InputVector1's.
645645
- `Matrix` - the destination matrix. The matrix dimensions must be MxN. The
646-
`Transpose` parameter for the matrix must be `false`.
646+
`Transpose` parameter for the matrix must be `false`. The `ML` parameter
647+
(matrix layout) for the matrix must be
648+
`dx::linalg::MatrixLayout::MATRIX_LAYOUT_OUTER_PRODUCT_OPTIMAL`. The `stride`
649+
parameter must be zero (for optimal layouts).
647650

648651
Implementation:
649652

@@ -665,6 +668,11 @@ void OuterProductAccumulate(
665668
} // namespace dx
666669
```
667670
671+
Diagnostics:
672+
673+
- Emit Diagnostic if MatrixLayout is not
674+
`dx::linalg::MatrixLayout::MATRIX_LAYOUT_OUTER_PRODUCT_OPTIMAL`.
675+
668676
## Function: VectorAccumulate
669677
670678
`dx::linalg::VectorAccumulate` accumulates the components of a vector

0 commit comments

Comments
 (0)