Add int8 paged KV support to main paths#3048
Add int8 paged KV support to main paths#3048lesj0610 wants to merge 4 commits intoflashinfer-ai:release-v0.6.7from
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces support for int8 quantized KV cache across various attention kernels. Key changes include the addition of int8_t vector type specializations in the CUDA backend, updated dispatch macros for FFI, and logic in the Python layer to handle scaling factors and backend selection (ensuring int8 KV falls back to the fa2 backend). New tests have been added to verify the correctness of int8 paged KV operations and scaling factor application. One review comment suggests improving the reliability of data type detection in the prefill logic by checking the tensor dtype directly instead of relying on itemsize.
| if out.itemsize == 1: | ||
| out = (out.to(float) * scale_v).to(out.dtype) | ||
| else: | ||
| out *= scale_v |
There was a problem hiding this comment.
8b5ca88 to
1aba362
Compare
|
/bot run |
1aba362 to
45110a6
Compare
|
Keeping this PR as the |
📌 Description
The main paged-KV path had no int8 support. This PR extends the following to accept int8 KV cache:
On Hopper, auto backend selection routes to FA2 when FA3 int8 KV is unavailable, so no combination falls through to an unsupported path.
Tested on Ampere (A100) and Hopper (H100):
7 tests passed on both architectures. The int4 part is in a separate follow-up PR.
🔍 Related Issues
🚀 Pull Request Checklist
✅ Pre-commit Checks
pip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests