Skip to content

Add SM120 Gluon flash attention (TMA + MMAv2)#8

Open
blake-snc wants to merge 1 commit intogpu-mode:mainfrom
blake-snc:add/flash-attention-sm120-gluon
Open

Add SM120 Gluon flash attention (TMA + MMAv2)#8
blake-snc wants to merge 1 commit intogpu-mode:mainfrom
blake-snc:add/flash-attention-sm120-gluon

Conversation

@blake-snc
Copy link

Summary

  • Adds index entry for SM120 Gluon flash attention kernel — first open-source Gluon FA targeting SM120 GPUs (DGX Spark, RTX 5090, GB10)
  • Uses TMA for all data movement + MMAv2 tensor cores (SM120 lacks tcgen05/WGMMA)
  • Supports BF16 and FP8 (E5M2), causal and non-causal
  • Kernel source submitted to triton-lang/kernels as PR #20
  • Also adds missing entry 15 (GPTQ Triton) to kernel_overview.md

Contributed by Second Nature Computing (https://joinsecondnature.com)

🤖 Generated with Claude Code

Index entry for the first open-source Gluon flash attention kernel
targeting SM120 GPUs (DGX Spark, RTX 5090, GB10). Uses TMA for all
data movement and MMAv2 tensor cores. Supports BF16 and FP8.

Kernel source: triton-lang/kernels PR #20

Also fixes missing entry 15 (GPTQ) in kernel_overview.md.

Contributed by Second Nature Computing (https://joinsecondnature.com)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant