[Matrix] Define Cbuffer Layout for `matrix_type`

Today a `matrix_type` will crash the `HLSLBufferLayoutBuilder` in the layoutField because we don't handle constant matrix types.
https://godbolt.org/z/8Er4ThbT3

In upstream matrix types get converted to array types in memory layout. The good thing about htis array transformation is that we won't have to do any special data scalarization transformations ofr matrix types. The default behavior however is C style in that it will try to pack
as much per row line as it can. 
https://github.com/llvm/llvm-project/blob/e5825c455ea40760d48be18491d383172dce4928/clang/lib/CodeGen/CodeGenTypes.cpp#L103-L109
```cpp
llvm::Type *CodeGenTypes::ConvertTypeForMem(QualType T) {
  if (T->isConstantMatrixType()) {
    const Type *Ty = Context.getCanonicalType(T).getTypePtr();
    const ConstantMatrixType *MT = cast<ConstantMatrixType>(Ty);
    return llvm::ArrayType::get(ConvertType(MT->getElementType()),
                                MT->getNumRows() * MT->getNumColumns());
  }
...
```

To get the compiler to stop crasing we need to define a handling for constant matrix types that understand the `ElemLayoutTy` will be an array and  defines  the `ArrayStride`, the `ElemSize`, and the `ElemOffset`.
 Something like the below will get things compiling to DXIL. 
```diff
diff --git a/clang/lib/CodeGen/HLSLBufferLayoutBuilder.cpp b/clang/lib/CodeGen/HLSLBufferLayoutBuilder.cpp
index 838903cdcd1e..28dc2f2887ec 100644
--- a/clang/lib/CodeGen/HLSLBufferLayoutBuilder.cpp
+++ b/clang/lib/CodeGen/HLSLBufferLayoutBuilder.cpp
@@ -10,6 +10,7 @@
 #include "CGHLSLRuntime.h"
 #include "CodeGenModule.h"
 #include "clang/AST/Type.h"
+#include "llvm/IR/DerivedTypes.h"
 #include <climits>
 
 //===----------------------------------------------------------------------===//
@@ -228,6 +229,14 @@ bool HLSLBufferLayoutBuilder::layoutField(const FieldDecl *FD,
     ElemSize = cast<llvm::TargetExtType>(ElemLayoutTy)->getIntParameter(0);
     ElemOffset = (Packoffset != -1) ? Packoffset : NextRowOffset;
 
+  } else if (FieldTy->isConstantMatrixType()) {
+     auto *MTy = FieldTy->getAs<ConstantMatrixType>();
+     ElemLayoutTy = CGM.getTypes().ConvertTypeForMem(FieldTy);
+     auto *ArrTy = cast<llvm::ArrayType>(ElemLayoutTy);
+     unsigned SubElemSize = ArrTy->getElementType()->getScalarSizeInBits() / 8;
+     ElemSize = MTy->getNumElementsFlattened() * SubElemSize;
+     ArrayStride = llvm::alignTo(ElemSize, SubElemSize);
+     ElemOffset = (Packoffset != -1) ? Packoffset : NextRowOffset;
   } else {
     // scalar or vector - find element size and alignment
     unsigned Align = 0;

```

This however is not sufficent to make a matrix cbuffer layout that will be legal for all DXIL cases.

##  HLSL cbuffer rules

Cbuffers are laid out in consecutive 16-byte slots. If we have a float scalar it would take up 4 bytes. Four byte scalars like floats and ints will get packed into a single 16 byte slot if they are laid out consecutively. Vectors works the same way. in other words scalars and vectors can share a slot boudary as long as they fit.

### example(s)

<img width="1434" height="232" alt="Image" src="https://github.com/user-attachments/assets/e82eeb30-6ee4-4aad-885a-bafe58063c47" />

<img width="1418" height="268" alt="Image" src="https://github.com/user-attachments/assets/6cdeaede-bfe7-42f8-98bd-b0f168474570" />

<img width="1408" height="282" alt="Image" src="https://github.com/user-attachments/assets/ae32f5ef-e665-4ab4-9188-ba6d0ca58824" />

This works really well for laying out an entire float4 vector into one slot. 

Arrays and matrices are special: each element/column has a 16-byte stride—even if the element is a float (4B), float2 (8B), or float3 (12B). This is the big gotcha. The overall size is rounded up to a multiple of 16 bytes.

### example(s)
```hlsl
cbuffer example {
    float4x1 M1;
    float1x4 M2;
};
```
<img width="1442" height="618" alt="Image" src="https://github.com/user-attachments/assets/116cb9b4-3b3d-4a9c-be2f-283927d0cc61" />

This is very different from what C would do. C would use each field’s natural alignment (4B for float), and Arrays are tightly packed using the element’s natural alignment/size. No 16-byte slot rule. A float3 would be a 12B and arrays of float3 have 12B stride (not 16).

In summary HLSL cbuffers, arrays (and matrix columns/rows) always step by 16 bytes, but in C, arrays step by the true element size—that’s makes HLSL layout of matrices much more spaced out. If we were to emulate this in C it would look something like https://godbolt.org/z/rharTWz1r
```cpp
// -Xclang -fdump-record-layouts
#include <stdalign.h>

typedef struct { float v;    float _pad[3]; }      HlslFloat1;  // 16B
typedef struct { float x,y;  float _pad[2]; }      HlslFloat2;  // 16B
typedef struct { float x,y,z; float _pad; }        HlslFloat3;  // 16B
typedef struct { float x,y,z,w; }                   HlslFloat4; // 16B

typedef struct {
    alignas(16) HlslFloat4 M1;        // float4
    HlslFloat1  M2[4];                // float M2[4] (16B stride)
    HlslFloat2  M3[3];                // float2 M3[3] (16B stride)
    HlslFloat3  M4[4];                // float3 M4[4] (16B stride)
} ExampleCB;

```

Action items
- [ ] write a proposal that capturs this 16-byte slot rule requirement for Matrix types.
- [ ] The proposal will need to define how we will handle the ConstantMatrixType clang type and its transformations to arrays when we lower to llvm ir.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Matrix] Define Cbuffer Layout for `matrix_type` #355

HLSL cbuffer rules

example(s)

example(s)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Matrix] Define Cbuffer Layout for matrix_type #355

Description

HLSL cbuffer rules

example(s)

example(s)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Matrix] Define Cbuffer Layout for `matrix_type` #355