Skip to content

Conversation

@blueberrycongee
Copy link

Summary

Add descriptive documentation to sgemm_sm80.cu explaining that it is actually an FP16xFP16 GEMM (HGEMM) tutorial using CuTe.

Problem

Issue #1686 reports that users cannot find an fp16 GEMM tutorial in the CuTe examples.

Solution

The existing sgemm_sm80.cu already implements FP16 GEMM using cute::half_t, but this was not documented. This PR adds a documentation block clarifying:

  • This example uses FP16 data types despite the "sgemm" filename
  • Key features: Tensor Cores, cp.async, pipelining, swizzled shared memory
  • Usage examples

Fixes #1686

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[QST] Is there any fp16xfp16 GEMM sample using CUTE with a performance comparable to cublas?

1 participant