Release/v0.21.5 by alihassanijr · Pull Request #310 · SHI-Labs/NATTEN

alihassanijr · 2026-02-06T23:30:33Z

Extended Attention (FMHA) functionality:
- Causal mask, variable length: for now only supported in CUTLASS FMHA and Blackwell FMHA.
Torch.compile support added
- All libnatten ops are now registered as torch ops, enabling full-graph compilation with NATTEN
  ops.
TokPerm kernels: Moved dilation to batch instead of heads, which finally unblocks GQA/MQA.
GQA/MQA support added for all FNA and FMHA operations.
- CUTLASS FNA/FMHA and Hopper FNA/FMHA don't support it in the kernels natively, therefore it's
  implemented with graph transforms for now.
Dedicated Token Permute kernels
- Token Permute/Unpermute and padding operations are now implemented as their own kernels, and can
  be used instead of the PyTorch implementation.
More accurate merge_attentions backward pass
- Limits number of outputs that can be merged to only 2 when requires_grad=True.
Misc bug fixes
Wheels for torch 2.10, python 3.14

A lot of applications use modules, and changes made in previous commits would break them because Q and KV projections were split. The way the forward signature and args to na ops were done was also a bit inconsistent. We did all of it to accommodate the torch compile test, but for now let's separate and duplicate until we can revisit it.

alihassanijr added 18 commits February 6, 2026 11:00

0.21.5 release changes

db0411c

Update copyright year

0bce7af

add gqa/mqa support to profiler

1872996

contig fix in torch wrappers

3b204dc

Parallel tester (finally)

ec1d813

bump version

fa50c58

update deep clean

7b15f2c

fix hopper test bug

87523eb

format

016d72e

fix hopper test bug

d4b5098

missed a copyright year

98a0269

update dockerfile

2024238

get rid of additional kv tests, redundant and annoying

5c80e99

parallel test additional checks and features

b6defe4

format

b084dec

better defaults for test_parallel

5141a96

move release forward

20311f9

alihassanijr mentioned this pull request Feb 8, 2026

Torch 2.10 and python 3.14 #308

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release/v0.21.5#310

Release/v0.21.5#310
alihassanijr wants to merge 18 commits intoSHI-Labs:mainfrom
alihassanijr:release/v0.21.5

alihassanijr commented Feb 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alihassanijr commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alihassanijr commented Feb 6, 2026 •

edited

Loading