Add int8 paged KV support to main paths by lesj0610 · Pull Request #3048 · flashinfer-ai/flashinfer

lesj0610 · 2026-04-13T15:24:47Z

📌 Description

The main paged-KV path had no int8 support. This PR extends the following to accept int8 KV cache:

append
single decode, single prefill
batch decode, batch prefill

On Hopper, auto backend selection routes to FA2 when FA3 int8 KV is unavailable, so no combination falls through to an unsupported path.

Tested on Ampere (A100) and Hopper (H100):

python -m pytest tests/attention/test_int8_paged_kv.py -v

7 tests passed on both architectures. The int4 part is in a separate follow-up PR.

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

coderabbitai · 2026-04-13T15:24:55Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 731b3528-e57c-4240-8ccf-93ae7164e8a4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces support for int8 quantized KV cache across various attention kernels. Key changes include the addition of int8_t vector type specializations in the CUDA backend, updated dispatch macros for FFI, and logic in the Python layer to handle scaling factors and backend selection (ensuring int8 KV falls back to the fa2 backend). New tests have been added to verify the correctness of int8 paged KV operations and scaling factor application. One review comment suggests improving the reliability of data type detection in the prefill logic by checking the tensor dtype directly instead of relying on itemsize.

gemini-code-assist · 2026-04-13T15:27:42Z

+        if out.itemsize == 1:
+            out = (out.to(float) * scale_v).to(out.dtype)
+        else:
+            out *= scale_v


The condition if out.itemsize == 1 is used to check for FP8/INT8 output types, but itemsize is not a reliable way to distinguish between data types in all cases. It is better to check out.dtype directly against torch.float8_e4m3fn or torch.int8 to avoid potential ambiguity.

yzh119 · 2026-04-13T20:04:39Z

/bot run

flashinfer-bot · 2026-04-13T20:05:00Z

GitLab MR !543 has been created, and the CI pipeline #48432967 is currently running. I'll report back once the pipeline job completes.

lesj0610 · 2026-04-17T07:16:20Z

Keeping this PR as the release-v0.6.7 snapshot and moving active review to #3100 against main.

lesj0610 requested review from aleozlx, bkryu, cyx-6, jimmyzho, kahyunnam, nv-yunzheq, saltyminty, sricketts, yongwww, yyihuang and yzh119 as code owners April 13, 2026 15:24

flashinfer-bot added the op: attention label Apr 13, 2026

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

lesj0610 force-pushed the codex/int8-paged-kv-main-path branch from 8b5ca88 to 1aba362 Compare April 13, 2026 15:39

This was referenced Apr 13, 2026

Add int4 paged KV support to main paths #3049

Draft

INT4 per-token-head KV cache + kv_dequant dispatch scaffold vllm-project/vllm#39668

Draft

yzh119 added the run-ci label Apr 13, 2026

lesj0610 mentioned this pull request Apr 14, 2026

[Feature] KV cache per-token-head Int2/Int4 Quantization + Triton_Quant_KV Interface vllm-project/vllm#39074

Open

lesj0610 added 4 commits April 14, 2026 13:18

Add int8 support to paged KV main path

97bff85

Apply formatter fixes for int8 paged KV

668cc19

Fix FA3 FP8 scale_v handling in single prefill

e4ce593

Use explicit dtype checks for low-precision prefill output

45110a6

lesj0610 force-pushed the codex/int8-paged-kv-main-path branch from 1aba362 to 45110a6 Compare April 14, 2026 04:18

lesj0610 mentioned this pull request Apr 17, 2026

Add int4 paged KV support to main paths #3101

Open

5 tasks

lesj0610 marked this pull request as draft April 17, 2026 07:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add int8 paged KV support to main paths#3048

Add int8 paged KV support to main paths#3048
lesj0610 wants to merge 4 commits intoflashinfer-ai:release-v0.6.7from
lesj0610:codex/int8-paged-kv-main-path

lesj0610 commented Apr 13, 2026

Uh oh!

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Uh oh!

lesj0610 Apr 13, 2026

Uh oh!

yzh119 commented Apr 13, 2026

Uh oh!

flashinfer-bot commented Apr 13, 2026

Uh oh!

lesj0610 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lesj0610 commented Apr 13, 2026

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Uh oh!

coderabbitai Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

lesj0610 Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

yzh119 commented Apr 13, 2026

Uh oh!

flashinfer-bot commented Apr 13, 2026

Uh oh!

lesj0610 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading