[0035] LinAlg: Add runtime metadata details: SVI0, PSV0, and RDAT#832
[0035] LinAlg: Add runtime metadata details: SVI0, PSV0, and RDAT#832tex3d wants to merge 26 commits into
Conversation
| * If operation is outside minimum requirements, gather and merge detailed usage | ||
| information. |
There was a problem hiding this comment.
Wondering if we should just always gather the detailed usage, and then only the runtime will need to be taught what the minimum requirements are?
There was a problem hiding this comment.
We could do that. It would end up with extra data that doesn't need to be checked, but we could consider this an optimization to reduce container size and perhaps a bit of extra runtime validation cost.
There was a problem hiding this comment.
One thing these two optimizations help with is canonicalization of runtime data, which is important for validation. Serializing unneeded data is an opportunity for more variation in runtime data parts that wouldn't make a difference at runtime.
Currently, the validator does not independently check each element of the runtime data against the module to see whether it's valid. Instead, it expects the runtime data to exactly match what's generated by the validator (with some ordering independence). If we serialize unnecessary data, we have to either keep serializing that unnecessary data in the future or change the way we validate the runtime data to be much more sophisticated for both PSV0 and RDAT.
| struct PSVLinAlgMatrixShapeArrayReference { | ||
| uint32_t ShapesIndex; // Index into SemanticIndexTable where array of indexes | ||
| // into LinAlgMatrixOperationShape table is located | ||
| uint32_t Count; |
There was a problem hiding this comment.
What does this Count signify?
There was a problem hiding this comment.
The size of the array. For each component type combination, this lists the set of unique dimensions used. The ShapesIndex is an index into the SemanticIndex buffer, which contains indices into the shared Shapes table, and Count is the number of shapes to index.
Perhaps the confusion is the name ShapesIndex, since this is an index into an index array buffer, not directly into the shape records.
| uint32_t WaveMatrixMultiplyCount; | ||
| uint32_t ThreadGroupMatrixMultiplyCount; | ||
| uint32_t OuterProductCount; | ||
| uint32_t AccumulateStoreCount; |
There was a problem hiding this comment.
I wonder if we're better off encoding the use information as typed unions rather than a bunch of counts followed by separate structure definitions. I know that a typed union could be more data, but it is also trivially easy to generate and parse.
| The `OperandType` and `ResultType` fields will encode one of the values defined | ||
| in the [`DXIL::ComponentType` enumeration](#dxil-enumerations). | ||
| ```cpp | ||
| RDAT_ENUM_START(LinAlgThreadVectorMatrixMultiplyFlag, uint8_t) |
There was a problem hiding this comment.
I'm not really sure that a bunch of code using DXC's RDAT macros is something we should rely on for specification. This isn't meaningful outside the DXC codebase and really lacks a lot of context.
I'd prefer if we defined the new structures that we're encoding either C-structs (similar to how we've done with PSV0), or as a markdown table something more like:
LinAlgMatrixOperationShape: Size 12 bytes
| Offset | Type | Contents |
|---|---|---|
| 0 | 32-bit unsigned integer | M dimension (rows in matrix A) |
| 4 | 32-bit unsigned integer | N dimension (columns in matrix B) |
| 8 | 32-bit unsigned integer | K dimension (columns in matrix A / rows in matrix B) |
If instead you prefer to define these as C structs, we also shouldn't just have a blob of C struct definitions, we should include explanations of the structure members and what each structure means.
There was a problem hiding this comment.
As an intermediate step, I've provided macro definitions and comments to clarify the record definitions for now. One reason I don't yet want to switch to a completely different definition structure is that it will be much harder to keep it in sync with the real definitions being used to iterate and verify correctness.
| 1. Do we need to gather more usage information for operations other than the | ||
| ones listed above when the matrix isn't captured by one of these operations? | ||
| 2. Do we need to capture component conversions with CopyConvertMatrix? |
There was a problem hiding this comment.
Do we want to require all drivers to be able to load and store all possible SM6.10 component types? I'm waffling on whether that seems trivial or onerous.
pow2clk
left a comment
There was a problem hiding this comment.
A few potential clarifications and consistency requests
| * `ThreadVectorMatrixMultiply` | ||
| * Iterate `LinAlgMatVecMul`/`LinAlgMatVecMulAdd` calls and gather shapes and | ||
| types. | ||
| * Matrix operand must be `MatrixLoadFromDescriptor` for transposed flag. |
There was a problem hiding this comment.
I don't understand this. It sounds like the transposed flag takes a matrix operand and it has to be MatrixLoadFromDescriptor?
There was a problem hiding this comment.
I had misunderstood this. The Matrix operand for ThreadScope is always MatrixLoadFromDescriptor or Phi right?
There was a problem hiding this comment.
We need to trace the matrix operand back to a MatrixLoadFromDescriptor, otherwise we can't determine the matrix layout, which indicates whether this matrix is considered loaded as transposed for this operation (via MulOptimalTranspose layout). We might need to disallow any complex paths obfuscating this def-use chain, such as phi, at least for now.
There was a problem hiding this comment.
I've updated the wording. Let me know if it's still confusing.
| Open questions: | ||
|
|
||
| 1. Do we need to gather more usage information for operations other than the | ||
| ones listed above when the matrix isn't captured by one of these operations? |
There was a problem hiding this comment.
With Conversion of InputVectors separated out from MatVec, we'll probably have a runtime API for supported conversions and can capture the parameters of that operation here?
There was a problem hiding this comment.
As this API is still TBD, I'll update this as necessary to supply that information. For now, there's an open question:
Do we need to capture component conversions with CopyConvertMatrix?
Which I think should cover the need to sort this out for vector conversions as well. At least that's the intent.
caf21ca to
9a66beb
Compare
Also adds MatrixConstruction gathering, but not data structures yet.
Added macro definitions and comments to explain the exact meaning of RDAT record elements.
9a66beb to
2b0b5d6
Compare
- remove transpose flag - add LinAlgMatrixAccumulateToMemory
This reverts commit 10c87ef.
… API" After clarification, the runtime API Min M/N/K values operate like the values for mul dimensions. Supported dimensions must be even multiples of each min dimension size, and multiple size combinations will be returned in the API. This means we must collect all size combinations after all. This reverts commit 7d81a23.
inbelic
left a comment
There was a problem hiding this comment.
Just some nits from a first pass through
| > 2) Do we need both operand types, or should we expect the operands to be the | ||
| > same type? | ||
| > 3) What flags do we need? | ||
| * Another approach coulld be to follow the RDAT pattern where we have a list of |
There was a problem hiding this comment.
nit:
| * Another approach coulld be to follow the RDAT pattern where we have a list of | |
| * Another approach could be to follow the RDAT pattern where we have a list of |
|
|
||
| #define RDAT_ENUM_START(eTy, sTy) enum class eTy : sTy { | ||
| #define RDAT_ENUM_VALUE(name, value) name = value, | ||
| #define RDAT_ENUM_VALUE_ALIAS(name, value) name = value, |
There was a problem hiding this comment.
| #define RDAT_ENUM_VALUE_ALIAS(name, value) name = value, |
seems unused
This change adds: