Page Attention GPU Kernel Integration #1137

christ-tt · 2025-04-28T19:01:43Z

Features:

Add paged attention as a valid backend for decoding.
Integrate Paged Attention GPU Kernel from Jax's implementation with customized bias and mask.

Testing:

Updated pallas kernel is tested in common/flash_attention/decoding_test.py
Updated flash_attention_implementation wrapper is tested in common/flash_attention/utils_test.py

TODO:

Update FlashAttention Layer when we have PagedKVCache implementation integrated.

markblee

Missing internal review.

feat: gpu paged attention kernel

e451e97

christ-tt requested review from ruomingp, markblee and a team as code owners April 28, 2025 19:01

markblee requested changes Apr 28, 2025

View reviewed changes

Senyu Tong added 3 commits May 1, 2025 17:31

feat: updated call signature; added docstring.

9e97eee

fix: docstring tabing

32c0729

Merge branch 'main' into paged-attn-gpu

477cb92

christ-tt requested a review from markblee May 2, 2025 23:41

Provide feedback