Skip to content

[QST]Question about LdMatrix16x16x8bOp #2871

@Willie-Qu

Description

@Willie-Qu

Hi, I have a question about the class LdMatrix16x16x8bOp. Transpose should not be asserted to be true, since PTX supports {.trans} modifier for ldmatrix.

Image

https://github.com/NVIDIA/cutlass/blob/49bd6bf1ba80abd588a56a3f3af2f1bcd41d215f/python/CuTeDSL/cutlass/cute/nvgpu/warp/copy.py#L82C1-L114C73

@dataclass(frozen=True)
class LdMatrix16x16x8bOp(BaseOp):
    """
    16x16 8-bit ``ldmatrix`` Operation.

    See the `PTX documentation <https://docs.nvidia.com/cuda/parallel-thread-execution/#warp-level-matrix-load-instruction-ldmatrix>`__.
    This operation corresponds to the ``.m16n16`` and the ``.b16`` qualifiers.
    """

    def __init__(self, num_matrices: int) -> None:
        super().__init__(transpose=True, num_matrices=num_matrices)
        self._verify()

    def _verify(self):
        assert self.transpose, "transpose must be True"
        if self.num_matrices not in [1, 2]:
            raise OpError(
                self,
                "expects the 'num_matrices' Op parameter to be one of [1,2]",
            )

    def _make_trait(
        self, copy_internal_type: Type[Numeric], *, loc=None, ip=None, **kwargs
    ) -> "LdMatrix16x16x8bTrait":
        mode = _pack_shape((16, 16), loc=loc, ip=ip)
        ty = _cute_nvgpu_ir.CopyAtomLdsmType.get(
            copy_internal_type.mlir_type,
            mode.type.attribute,
            _cute_nvgpu_ir.LdsmSzPattern.u8,
            self.num_matrices,
            ir.UnitAttr.get(),
        )
        return LdMatrix16x16x8bTrait(cute.make_atom(ty, loc=loc, ip=ip))

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions