[Kernel] support kv cache quantization in ragged attention kernel #9249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

qihqi merged 6 commits into master from chengji/improve-attn

May 30, 2025

Collaborator

yaochengji commented May 27, 2025

No description provided.


          [Kernel] support kv cache quantization in ragged attention kernel

b57e457

yaochengji requested a review from vanbasten23

May 27, 2025 18:54

yaochengji added 2 commits

May 27, 2025 20:25


          modify python op

d3655f8


          fix test

52a0009

yaochengji force-pushed the chengji/improve-attn branch from 057c40b to 52a0009 Compare

May 27, 2025 21:30

yaochengji added 2 commits

May 27, 2025 23:06


          fix test

5749e4c


          fix test

070138a

yaochengji force-pushed the chengji/improve-attn branch from cfee66b to 070138a Compare

May 28, 2025 03:01

vanbasten23 reviewed

View reviewed changes

torch_xla/experimental/custom_kernel.py Show resolved Hide resolved

vanbasten23 reviewed

View reviewed changes

test/test_pallas.py Show resolved Hide resolved

vanbasten23 reviewed

View reviewed changes

test/test_pallas.py Show resolved Hide resolved

vanbasten23 reviewed

View reviewed changes

test/test_pallas.py Outdated Show resolved Hide resolved

vanbasten23 reviewed

View reviewed changes

test/test_pallas.py Outdated Show resolved Hide resolved


          fix comments

80412af

vanbasten23 approved these changes

View reviewed changes

Collaborator

vanbasten23 left a comment

Thanks Chengji. Looks good with one question for https://github.com/pytorch/xla/pull/9249/files#r2113078862.

qihqi merged commit 2edcd2e into master

31 of 32 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet