Release v0.5.0 · fla-org/flash-linear-attention

✨ Highlights

The main new model this release is MoBA / FlashMoBA (#840, #845), which brings Moonshot's Mixture of Block Attention into the fla family.

Beyond new models, fla is also moving from a Triton-only stack to a multi-backend one: FlashKDA is added as a new backend for KDA (#852), and TileLang is introduced for GDN, KDA, and parallel attention kernels (#827, #846, #854), with more backends to come.

What's Changed

[CP] Fix missing bos and i_h offsets in backward gk loads by @zhiyuan1i in #781
[KDA] Clarify gate input tracking in chunk backward by @zhiyuan1i in #785
[GDN] Fuse kkt + solve_tril kernel & unified benchmark infrastructure by @yzhangcs in #789
[Conv] Fix int32 overflow in conv kernel pointer arithmetic for large tensors by @tmct in #783
[GDN] Add exp2 support across chunk kernels for improved performance by @yzhangcs in #791
Fix parameter initialization for FSDP meta device compatibility by @yzhangcs in #793
[GDN] Fix missing mask on off-diagonal blocks in fused kkt+s… by @yzhangcs in #794
Fix layer_norm_bwd_kernel OOB access on high-SM GPUs by @mpurland in #795
[Misc] Upgrade minimum PyTorch requirement to 2.7.0 by @zhiyuan1i in #801
[Conv] Fix int32 overflow in varlen conv kernel pointer arithmetic by @tmct in #803
[GDN] Add GVA support by @zhiyuan1i in #799
[BugFix] Fix illegal memory access in KDA backward by dropping buggy autotune configs on Hopper by @zhiyuan1i in #807
[CE] Add logit softcapping support to fused cross entropy by @yzhangcs in #810
[Mamba] Remove unused arguments and update to align with mamba_ssm by @PuR3Luck in #782
[GDN] Native GVA support: remove redundant Q/K repeat and unify head naming by @yzhangcs in #812
[KDA] Add safe_gate/lower_bound support and improve docstrings by @yzhangcs in #814
[GDN] Add fused gate kernel with use_gate_in_kernel support by @yzhangcs in #813
chore: add AUTHORS, unify copyright headers, and add CI workflows by @yzhangcs in #816
[CI] Improve benchmark outputs by @yzhangcs in #817
[CI] Fix skip-test check failing on fork PRs by @zhiyuan1i in #821
[CP] Enable KCP for DPLR by @zhiyuan1i in #822
[Fix] Guard checkpoint weight re-initialization in RWKV-7, Mamba, Mamba2, and LogLinearMamba2 by @puigde in #820
fix: register default global_scratch allocator on Blackwell GPUs by @ssubbotin in #825
[Attn] Add sliding window attention support by @yzhangcs in #824
[Docs] Add CONTRIBUTING.md by @yzhangcs in #830
allow neg eigvals for delta-net by @hoedt in #832
chore: add standalone isort config by @yzhangcs in #834
Add autotune for causal conv update by @MARD1NO in #828
[GDN] Add TileLang backend for chunk_bwd_dqkwg kernel by @zhiyuan1i in #827
[Refactor] Simplify TileLang backend directory structure by @yzhangcs in #835
[CI] Post benchmark comment via workflow_run for fork-safe PRs by @yzhangcs in #841
[KDA] Add Grouped Value Attention (GVA) support by @yzhangcs in #833
[Attn] Add GPT-OSS-style attention sink support by @Shomvel in #831
[Fix] respect user-provided cu_seqlens when attention_mask is present by @yzhangcs in #842
[Fix] flatten batched qkv in varlen cu_seqlens path by @lxr-tech in #839
[Fix] Enforce batch-size check for varlen mode across multiple ops by @zhiyuan1i in #844
[MoBA] Integrate MOBA and FlashMOBA by @ReyJerry in #840
[MoBA] Follow-up: fix broken import, rename layer, add modeling, tests & docs by @yzhangcs in #845
[GDN/KDA] Fuse gate activation into fused_recurrent kernels by @yzhangcs in #848
[TileLang] Add fwd/bwd kernel for parallel attention by @zhiyuan1i in #846
[CI] Add issue/PR pytest command workflow by @zhiyuan1i in #843
[GDN] Optimize b_dg computation in chunk_bwd_kernel_dqkwg#USE_G by @MzeroMiko in #823
[GDN][Tilelang] Optimize b_dg computation in chunk_bwd_kernel_dqkwg by @zhiyuan1i in #849
[Linear Attention] Update fused_recurrent.py for inference with normalization by @yiyousong in #268
[Fix] Fix incorrect cumsum dim for naive_chunk_linear_attn normalize by @zhiyuan1i in #851
[Cleanup] Remove deprecated head_first parameter from public ops by @yzhangcs in #853
[KDA] Support FLASHKDA backend by @zhiyuan1i in #852
[KDA][TileLang] Add TileLang backend for chunk_kda_bwd_wy_dqkg_fused by @zhiyuan1i in #854

New Contributors

@tmct made their first contribution in #783
@mpurland made their first contribution in #795
@PuR3Luck made their first contribution in #782
@puigde made their first contribution in #820
@ssubbotin made their first contribution in #825
@hoedt made their first contribution in #832
@MARD1NO made their first contribution in #828
@Shomvel made their first contribution in #831
@lxr-tech made their first contribution in #839
@MzeroMiko made their first contribution in #823
@yiyousong made their first contribution in #268

Full Changelog: v0.4.2...v0.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.5.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

✨ Highlights

What's Changed

New Contributors

Contributors

Uh oh!