Skip to content

Release/v0.21.5#310

Draft
alihassanijr wants to merge 18 commits intoSHI-Labs:mainfrom
alihassanijr:release/v0.21.5
Draft

Release/v0.21.5#310
alihassanijr wants to merge 18 commits intoSHI-Labs:mainfrom
alihassanijr:release/v0.21.5

Conversation

@alihassanijr
Copy link
Collaborator

@alihassanijr alihassanijr commented Feb 6, 2026

  • Extended Attention (FMHA) functionality:
    • Causal mask, variable length: for now only supported in CUTLASS FMHA and Blackwell FMHA.
  • Torch.compile support added
    • All libnatten ops are now registered as torch ops, enabling full-graph compilation with NATTEN
      ops.
  • TokPerm kernels: Moved dilation to batch instead of heads, which finally unblocks GQA/MQA.
  • GQA/MQA support added for all FNA and FMHA operations.
    • CUTLASS FNA/FMHA and Hopper FNA/FMHA don't support it in the kernels natively, therefore it's
      implemented with graph transforms for now.
  • Dedicated Token Permute kernels
    • Token Permute/Unpermute and padding operations are now implemented as their own kernels, and can
      be used instead of the PyTorch implementation.
  • More accurate merge_attentions backward pass
    • Limits number of outputs that can be merged to only 2 when requires_grad=True.
  • Misc bug fixes
  • Wheels for torch 2.10, python 3.14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant