[0035] LinAlg: Add runtime metadata details: SVI0, PSV0, and RDAT by tex3d · Pull Request #832 · microsoft/hlsl-specs

tex3d · 2026-03-27T03:09:13Z

This change adds:

LinearAlgebra feature use flag to the FeatureInfo DXIL container part (SFI0)
extended feature use gathering process/rules and details for PSV0 and RDAT DXIL container parts for feature use beyond LinAlg Tier 1 minimum requirements.

damyanp · 2026-03-27T03:30:58Z

+* If operation is outside minimum requirements, gather and merge detailed usage
+  information.


Wondering if we should just always gather the detailed usage, and then only the runtime will need to be taught what the minimum requirements are?

We could do that. It would end up with extra data that doesn't need to be checked, but we could consider this an optimization to reduce container size and perhaps a bit of extra runtime validation cost.

One thing these two optimizations help with is canonicalization of runtime data, which is important for validation. Serializing unneeded data is an opportunity for more variation in runtime data parts that wouldn't make a difference at runtime.

Currently, the validator does not independently check each element of the runtime data against the module to see whether it's valid. Instead, it expects the runtime data to exactly match what's generated by the validator (with some ordering independence). If we serialize unnecessary data, we have to either keep serializing that unnecessary data in the future or change the way we validate the runtime data to be much more sophisticated for both PSV0 and RDAT.

llvm-beanz · 2026-03-27T19:24:59Z

+struct PSVLinAlgMatrixShapeArrayReference {
+  uint32_t ShapesIndex; // Index into SemanticIndexTable where array of indexes
+                        // into LinAlgMatrixOperationShape table is located
+  uint32_t Count;


What does this Count signify?

The size of the array. For each component type combination, this lists the set of unique dimensions used. The ShapesIndex is an index into the SemanticIndex buffer, which contains indices into the shared Shapes table, and Count is the number of shapes to index.

Perhaps the confusion is the name ShapesIndex, since this is an index into an index array buffer, not directly into the shape records.

llvm-beanz · 2026-03-27T19:26:50Z

+  uint32_t WaveMatrixMultiplyCount;
+  uint32_t ThreadGroupMatrixMultiplyCount;
+  uint32_t OuterProductCount;
+  uint32_t AccumulateStoreCount;


I wonder if we're better off encoding the use information as typed unions rather than a bunch of counts followed by separate structure definitions. I know that a typed union could be more data, but it is also trivially easy to generate and parse.

llvm-beanz · 2026-03-27T19:47:30Z

-The `OperandType` and `ResultType` fields will encode one of the values defined
-in the [`DXIL::ComponentType` enumeration](#dxil-enumerations).
+```cpp
+RDAT_ENUM_START(LinAlgThreadVectorMatrixMultiplyFlag, uint8_t)


I'm not really sure that a bunch of code using DXC's RDAT macros is something we should rely on for specification. This isn't meaningful outside the DXC codebase and really lacks a lot of context.

I'd prefer if we defined the new structures that we're encoding either C-structs (similar to how we've done with PSV0), or as a markdown table something more like:

LinAlgMatrixOperationShape: Size 12 bytes

Offset Type Contents

0 32-bit unsigned integer M dimension (rows in matrix A)

4 32-bit unsigned integer N dimension (columns in matrix B)

8 32-bit unsigned integer K dimension (columns in matrix A / rows in matrix B)

If instead you prefer to define these as C structs, we also shouldn't just have a blob of C struct definitions, we should include explanations of the structure members and what each structure means.

As an intermediate step, I've provided macro definitions and comments to clarify the record definitions for now. One reason I don't yet want to switch to a completely different definition structure is that it will be much harder to keep it in sync with the real definitions being used to iterate and verify correctness.

jenatali · 2026-03-31T21:06:19Z

+1. Do we need to gather more usage information for operations other than the
+   ones listed above when the matrix isn't captured by one of these operations?
+2. Do we need to capture component conversions with CopyConvertMatrix?


Do we want to require all drivers to be able to load and store all possible SM6.10 component types? I'm waffling on whether that seems trivial or onerous.

pow2clk

A few potential clarifications and consistency requests

pow2clk · 2026-04-01T14:28:27Z

+* `ThreadVectorMatrixMultiply`
+  * Iterate `LinAlgMatVecMul`/`LinAlgMatVecMulAdd` calls and gather shapes and
+    types.
+  * Matrix operand must be `MatrixLoadFromDescriptor` for transposed flag.


I don't understand this. It sounds like the transposed flag takes a matrix operand and it has to be MatrixLoadFromDescriptor?

I had misunderstood this. The Matrix operand for ThreadScope is always MatrixLoadFromDescriptor or Phi right?

We need to trace the matrix operand back to a MatrixLoadFromDescriptor, otherwise we can't determine the matrix layout, which indicates whether this matrix is considered loaded as transposed for this operation (via MulOptimalTranspose layout). We might need to disallow any complex paths obfuscating this def-use chain, such as phi, at least for now.

I've updated the wording. Let me know if it's still confusing.

anupamachandra · 2026-04-01T20:41:25Z

+Open questions:
+
+1. Do we need to gather more usage information for operations other than the
+   ones listed above when the matrix isn't captured by one of these operations?


With Conversion of InputVectors separated out from MatVec, we'll probably have a runtime API for supported conversions and can capture the parameters of that operation here?

As this API is still TBD, I'll update this as necessary to supply that information. For now, there's an open question:

Do we need to capture component conversions with CopyConvertMatrix?

Which I think should cover the need to sort this out for vector conversions as well. At least that's the intent.

Also adds MatrixConstruction gathering, but not data structures yet.

Added macro definitions and comments to explain the exact meaning of RDAT record elements.

- remove transpose flag - add LinAlgMatrixAccumulateToMemory

This reverts commit 10c87ef.

… API" After clarification, the runtime API Min M/N/K values operate like the values for mul dimensions. Supported dimensions must be even multiples of each min dimension size, and multiple size combinations will be returned in the API. This means we must collect all size combinations after all. This reverts commit 7d81a23.

inbelic

Just some nits from a first pass through

inbelic · 2026-05-26T18:15:15Z

-> 2) Do we need both operand types, or should we expect the operands to be the
->    same type?
-> 3) What flags do we need?
+* Another approach coulld be to follow the RDAT pattern where we have a list of


nit:

Suggested change

* Another approach coulld be to follow the RDAT pattern where we have a list of

* Another approach could be to follow the RDAT pattern where we have a list of

inbelic · 2026-05-26T18:17:21Z

+
+#define RDAT_ENUM_START(eTy, sTy)           enum class eTy : sTy {
+#define RDAT_ENUM_VALUE(name, value)        name = value,
+#define RDAT_ENUM_VALUE_ALIAS(name, value)  name = value,


Suggested change

#define RDAT_ENUM_VALUE_ALIAS(name, value) name = value,

seems unused

github-actions Bot added the needs-triage label Mar 27, 2026

github-project-automation Bot added this to HLSL Triage Mar 27, 2026

damyanp reviewed Mar 27, 2026

View reviewed changes

Comment thread proposals/0035-linalg-matrix.md Outdated

damyanp requested a review from jenatali March 27, 2026 03:35

llvm-beanz reviewed Mar 27, 2026

View reviewed changes

tex3d mentioned this pull request Mar 31, 2026

[0035] Specify a way to transpose Thread Scope matrices on load and constraint loads to A-type Matrices. #803

Merged

jenatali reviewed Mar 31, 2026

View reviewed changes

pow2clk reviewed Apr 1, 2026

View reviewed changes

anupamachandra reviewed Apr 1, 2026

View reviewed changes

Comment thread proposals/0035-linalg-matrix.md Outdated

anupamachandra reviewed Apr 1, 2026

View reviewed changes

Comment thread proposals/0035-linalg-matrix.md Outdated

anupamachandra reviewed Apr 1, 2026

View reviewed changes

tex3d force-pushed the linalg-runtime-data branch 2 times, most recently from caf21ca to 9a66beb Compare April 9, 2026 18:02

tex3d added 6 commits April 9, 2026 11:02

[0035] LinAlg: Add runtime metadata details: SVI0, PSV0, and RDAT

822b2d4

Address some wording feedback for clarity

0ac8de0

Remove shape merging

008dcbf

Also adds MatrixConstruction gathering, but not data structures yet.

Update PSV0 and RDAT definitions

1f3fcca

Added macro definitions and comments to explain the exact meaning of RDAT record elements.

Update open questions

12131d1

Update comments on LinAlgMatrixOperationShape for clarity

2b0b5d6

tex3d force-pushed the linalg-runtime-data branch from 9a66beb to 2b0b5d6 Compare April 9, 2026 18:02

tex3d added 9 commits April 10, 2026 17:38

Update AccumulateStore

5d9aab8

- remove transpose flag - add LinAlgMatrixAccumulateToMemory

update gather process

7b9eb77

add MatrixNonOptimalLayout flag; minor adjustments

110792b

Remove Bias type and unneeded shapes from thread op.

5c48039

Use ThreadMatrixVectorMultiply instead of *VectorMatrix* for consistency

bbc50b7

Use Min M/N/K for MatrixConstruction, consistent with runtime API

7d81a23

Use MatrixNonMulOptimalLayout for clarity.

76dca97

Fix typos

14e7416

Add open question for Min M/N/K validation rule

10c87ef

tex3d added 11 commits May 18, 2026 18:26

Add comment about thread-scope for MatrixConstruction

88c6acd

Minor additions for clarification

98a62f5

Update VectorAccumulate

9e007b8

Merge remote-tracking branch 'ms/main' into linalg-runtime-data

e4c8a6f

VectorAccumulate: clarify that input llvm vector component type is used

2ce8dba

Revert "Add open question for Min M/N/K validation rule"

17d992d

This reverts commit 10c87ef.

Fix language for MatrixConstruction gather; add comment in header defs

0607cf1

ThreadMatrixVectorMultiply flags should support merging.

f47b2db

Clarify dedup behavior in gathering process

5fcd04f

Add clarifying comments back and fix typo

b0058d6

inbelic reviewed May 26, 2026

View reviewed changes

		* If operation is outside minimum requirements, gather and merge detailed usage
		information.

Offset	Type	Contents
0	32-bit unsigned integer	M dimension (rows in matrix A)
4	32-bit unsigned integer	N dimension (columns in matrix B)
8	32-bit unsigned integer	K dimension (columns in matrix A / rows in matrix B)

	* Another approach coulld be to follow the RDAT pattern where we have a list of
	* Another approach could be to follow the RDAT pattern where we have a list of

Conversation

tex3d commented Mar 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pow2clk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

inbelic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants