Skip to content

[Matrix] Define Cbuffer Layout for matrix_type #355

@farzonl

Description

@farzonl

Today a matrix_type will crash the HLSLBufferLayoutBuilder in the layoutField because we don't handle constant matrix types.
https://godbolt.org/z/8Er4ThbT3

In upstream matrix types get converted to array types in memory layout. The good thing about htis array transformation is that we won't have to do any special data scalarization transformations ofr matrix types. The default behavior however is C style in that it will try to pack
as much per row line as it can.
https://github.com/llvm/llvm-project/blob/e5825c455ea40760d48be18491d383172dce4928/clang/lib/CodeGen/CodeGenTypes.cpp#L103-L109

llvm::Type *CodeGenTypes::ConvertTypeForMem(QualType T) {
  if (T->isConstantMatrixType()) {
    const Type *Ty = Context.getCanonicalType(T).getTypePtr();
    const ConstantMatrixType *MT = cast<ConstantMatrixType>(Ty);
    return llvm::ArrayType::get(ConvertType(MT->getElementType()),
                                MT->getNumRows() * MT->getNumColumns());
  }
...

To get the compiler to stop crasing we need to define a handling for constant matrix types that understand the ElemLayoutTy will be an array and defines the ArrayStride, the ElemSize, and the ElemOffset.
Something like the below will get things compiling to DXIL.

diff --git a/clang/lib/CodeGen/HLSLBufferLayoutBuilder.cpp b/clang/lib/CodeGen/HLSLBufferLayoutBuilder.cpp
index 838903cdcd1e..28dc2f2887ec 100644
--- a/clang/lib/CodeGen/HLSLBufferLayoutBuilder.cpp
+++ b/clang/lib/CodeGen/HLSLBufferLayoutBuilder.cpp
@@ -10,6 +10,7 @@
 #include "CGHLSLRuntime.h"
 #include "CodeGenModule.h"
 #include "clang/AST/Type.h"
+#include "llvm/IR/DerivedTypes.h"
 #include <climits>
 
 //===----------------------------------------------------------------------===//
@@ -228,6 +229,14 @@ bool HLSLBufferLayoutBuilder::layoutField(const FieldDecl *FD,
     ElemSize = cast<llvm::TargetExtType>(ElemLayoutTy)->getIntParameter(0);
     ElemOffset = (Packoffset != -1) ? Packoffset : NextRowOffset;
 
+  } else if (FieldTy->isConstantMatrixType()) {
+     auto *MTy = FieldTy->getAs<ConstantMatrixType>();
+     ElemLayoutTy = CGM.getTypes().ConvertTypeForMem(FieldTy);
+     auto *ArrTy = cast<llvm::ArrayType>(ElemLayoutTy);
+     unsigned SubElemSize = ArrTy->getElementType()->getScalarSizeInBits() / 8;
+     ElemSize = MTy->getNumElementsFlattened() * SubElemSize;
+     ArrayStride = llvm::alignTo(ElemSize, SubElemSize);
+     ElemOffset = (Packoffset != -1) ? Packoffset : NextRowOffset;
   } else {
     // scalar or vector - find element size and alignment
     unsigned Align = 0;

This however is not sufficent to make a matrix cbuffer layout that will be legal for all DXIL cases.

HLSL cbuffer rules

Cbuffers are laid out in consecutive 16-byte slots. If we have a float scalar it would take up 4 bytes. Four byte scalars like floats and ints will get packed into a single 16 byte slot if they are laid out consecutively. Vectors works the same way. in other words scalars and vectors can share a slot boudary as long as they fit.

example(s)

Image Image Image

This works really well for laying out an entire float4 vector into one slot.

Arrays and matrices are special: each element/column has a 16-byte stride—even if the element is a float (4B), float2 (8B), or float3 (12B). This is the big gotcha. The overall size is rounded up to a multiple of 16 bytes.

example(s)

cbuffer example {
    float4x1 M1;
    float1x4 M2;
};
Image

This is very different from what C would do. C would use each field’s natural alignment (4B for float), and Arrays are tightly packed using the element’s natural alignment/size. No 16-byte slot rule. A float3 would be a 12B and arrays of float3 have 12B stride (not 16).

In summary HLSL cbuffers, arrays (and matrix columns/rows) always step by 16 bytes, but in C, arrays step by the true element size—that’s makes HLSL layout of matrices much more spaced out. If we were to emulate this in C it would look something like https://godbolt.org/z/rharTWz1r

// -Xclang -fdump-record-layouts
#include <stdalign.h>

typedef struct { float v;    float _pad[3]; }      HlslFloat1;  // 16B
typedef struct { float x,y;  float _pad[2]; }      HlslFloat2;  // 16B
typedef struct { float x,y,z; float _pad; }        HlslFloat3;  // 16B
typedef struct { float x,y,z,w; }                   HlslFloat4; // 16B

typedef struct {
    alignas(16) HlslFloat4 M1;        // float4
    HlslFloat1  M2[4];                // float M2[4] (16B stride)
    HlslFloat2  M3[3];                // float2 M3[3] (16B stride)
    HlslFloat3  M4[4];                // float3 M4[4] (16B stride)
} ExampleCB;

Action items

  • write a proposal that capturs this 16-byte slot rule requirement for Matrix types.
  • The proposal will need to define how we will handle the ConstantMatrixType clang type and its transformations to arrays when we lower to llvm ir.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Planning

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions