Skip to content

Commit 7b2b62d

Browse files
committed
Merge branch 'main' of https://github.com/microsoft/hlsl-specs into cbuffer
2 parents 6f68ba4 + 767fa7f commit 7b2b62d

2 files changed

Lines changed: 84 additions & 7 deletions

File tree

proposals/0029-cooperative-vector.md

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -77,9 +77,11 @@ void ps_main(args) // args: texture, normal, position
7777
7878
**Neural Network based shader**
7979
80-
Below shader is in HLSL-like psuedocode, to highlight the idea of what replacing physical computations with a neural network based evaluation looks like. The exact syntax for the new intrinsics is intentionally skipped to keep it simple, later sections contain examples with the correct syntax and sample descriptors.
80+
The shader below shows the idea of what replacing physical computations with a
81+
neural network based evaluation looks like. Some details have been omitted, but
82+
this should give a sense of how these new operations can be used.
8183
82-
> NOTE: see proposal [0031] for the latest on the HLSL API.
84+
> NOTE: see proposal [0031] for full details on the HLSL API.
8385
8486
```c++
8587
ByteAddressBuffer inputMatrix0;
@@ -89,29 +91,38 @@ ByteAddressBuffer biasVector1;
8991
9092
void ps_main(args) // args: texture, normal, position
9193
{
94+
using namespace dx::linalg;
95+
9296
PreProcessing(args);
9397
// Neural Network computes the output vector
9498
// using the same input args and trained data
9599
// in the form of matrices and bias vectors.
96100
97101
// The input vector is computed from the shader input
98-
vector<uint32_t, M> inputVector = SomeFunction(args);
102+
vector<uint32_t, INPUT_SIZE> inputVector = SomeFunction(args);
99103
100104
// Below the physical calculations are replaced by NN evaluation
101105
// the Matrix and Bias are trained offline and loaded to memory
102106
103107
// layer0 = inputVector*inputMatrix + biasVector0
104108
// The matrix and bias are loaded from memory at offsets : moffset0 and boffset0
105-
vector<uint32_t, K> layer0 = MatrixVectorMulAdd(inputVector, inputMatrix0, moffset0, biasVector0, boffset0);
109+
MatrixRef<DATA_TYPE_UINT32, N, INPUT_SIZE, MATRIX_LAYOUT_MUL_OPTIMAL> M0 = { inputMatrix0, moffset0, 0 };
110+
VectorRef<DATA_TYPE_UINT32> B0 = { biasVector0, boffset0 };
111+
112+
vector<uint32_t, N> layer0 = MulAdd<uint32_t>(M0, MakeInterpretedVector<DATA_TYPE_UINT32>(inputVector), B0);
106113
layer0 = max(layer0,0); // Apply activation function
107114
108-
// layer0 = inputVector*inputMatrix0 + biasVector0
115+
// layer1 = inputVector*inputMatrix0 + biasVector0
109116
// The matrix and bias are loaded from memory at offsets : moffset1 and boffset1
110-
vector<uint32_t, K> layer1 = MatrixVectorMulAdd(layer0, inputMatrix0, moffset1, biasVector0, boffset1);
117+
MatrixRef<DATA_TYPE_UINT32, N, N, MATRIX_LAYOUT_MUL_OPTIMAL> M1 = { inputMatrix0, moffset1, 0 };
118+
VectorRef<DATA_TYPE_UINT32> B1 = { biasVector0, boffset1 };
119+
vector<uint32_t, K> layer1 = MulAdd<uint32_t>(M1, MakeInterpretedVector<DATA_TYPE_UINT32>(layer0), B1);
111120
layer1 = max(layer1,0); // Apply activation function
112121
113122
// output = layer1*inputMatrix1 + biasVector1
114-
vector<uint32_t, N> output = MatrixVectorMulAdd(layer1, inputMatrix1, biasVector1);
123+
MatrixRef<DATA_TYPE_UINT32, OUTPUT_SIZE, N, MATRIX_LAYOUT_MUL_OPTIMAL> M2 = { inputMatrix1, 0, 0 };
124+
VectorRef<DATA_TYPE_UIN32> B2 = { biasVector1, 0 };
125+
vector<uint32_t, OUTPUT_SIZE> output = MulAdd<uint32_t>(M2, MakeInterpretedVector<DATA_TYPE_UINT32>(layer1), B2);
115126
116127
output = exp(output);
117128

proposals/0030-dxil-vectors.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,7 @@ Previously usage of `extractelement` and `insertelement` in DXIL didn't allow dy
100100
#### Elementwise intrinsics
101101

102102
A selection of elementwise intrinsics are given additional native vector forms.
103+
The full list of intrinsics with elementwise overloads is listed in [Appendix 1](#appendix-1-new-elementwise-overloads).
103104
Elementwise intrinsics are those that perform their calculations irrespective of the location of the element
104105
in the vector or matrix arguments except insofar as that position corresponds to those of the other elements
105106
that might be used in the individual element calculations.
@@ -183,6 +184,71 @@ Calculations should produce the correct results in all cases for a range of vect
183184
In practice, this testing will largely represent verifying correct intrinsic output
184185
with the new shader model.
185186

187+
## Appendix 1: New Elementwise Overloads
188+
189+
| Opcode | Name | Class |
190+
| ------ | -------------- | -------- |
191+
| 6 | FAbs | Unary |
192+
| 7 | Saturate | Unary |
193+
| 8 | IsNaN | IsSpecialFloat |
194+
| 9 | IsInf | IsSpecialFloat |
195+
| 10 | IsFinite | IsSpecialFloat |
196+
| 11 | IsNormal | IsSpecialFloat |
197+
| 12 | Cos | Unary |
198+
| 13 | Sin | Unary |
199+
| 14 | Tan | Unary |
200+
| 15 | Acos | Unary |
201+
| 16 | Asin | Unary |
202+
| 17 | Atan | Unary |
203+
| 18 | Hcos | Unary |
204+
| 19 | Hsin | Unary |
205+
| 20 | Htan | Unary |
206+
| 21 | Exp | Unary |
207+
| 22 | Frc | Unary |
208+
| 23 | Log | Unary |
209+
| 24 | Sqrt | Unary |
210+
| 25 | Rsqrt | Unary |
211+
| 26 | Round_ne | Unary |
212+
| 27 | Round_ni | Unary |
213+
| 28 | Round_pi | Unary |
214+
| 29 | Round_z | Unary |
215+
| 30 | Bfrev | Unary |
216+
| 31 | Countbits | UnaryBits |
217+
| 32 | FirstBitLo | UnaryBits |
218+
| 33 | FirstBitHi | UnaryBits |
219+
| 34 | FirstBitSHi | UnaryBits |
220+
| 35 | FMax | Binary |
221+
| 36 | FMin | Binary |
222+
| 37 | IMax | Binary |
223+
| 38 | IMin | Binary |
224+
| 39 | UMax | Binary |
225+
| 40 | UMin | Binary |
226+
| 46 | FMad | Tertiary |
227+
| 47 | Fma | Tertiary |
228+
| 48 | IMad | Tertiary |
229+
| 49 | UMad | Tertiary |
230+
| 83 | DerivCoarseX | Unary |
231+
| 84 | DerivCoarseY | Unary |
232+
| 85 | DerivFineX | Unary |
233+
| 86 | DerivFineY | Unary |
234+
| 115 | WaveActiveAllEqual | WaveActiveAllEqual |
235+
| 117 | WaveReadLaneAt | WaveReadLaneAt |
236+
| 118 | WaveReadLaneFirst | WaveReadLaneFirst |
237+
| 119 | WaveActiveOp | WaveActiveOp |
238+
| 120 | WaveActiveBit | WaveActiveBit |
239+
| 121 | WavePrefixOp | WavePrefixOp |
240+
| 122 | QuadReadLaneAt | QuadReadLaneAt |
241+
| 123 | QuadOp | QuadOp |
242+
| 124 | BitcastI16toF16 | BitcastI16toF16 |
243+
| 125 | BitcastF16toI16 | BitcastF16toI16 |
244+
| 126 | BitcastI32toF32 | BitcastI32toF32 |
245+
| 127 | BitcastF32toI32 | BitcastF32toI32 |
246+
| 128 | BitcastI64toF64 | BitcastI64toF64 |
247+
| 129 | BitcastF64toI64 | BitcastF64toI64 |
248+
| 165 | WaveMatch | WaveMatch |
249+
250+
251+
186252
## Acknowledgments
187253

188254
* [Anupama Chandrasekhar](https://github.com/anupamachandra) and [Tex Riddell](https://github.com/tex3d) for foundational contributions to the design.

0 commit comments

Comments
 (0)