[XPU] Support cpu kv offloading on XPU platform by chaojun-zhang · Pull Request #36423 · vllm-project/vllm

chaojun-zhang · 2026-03-09T00:59:04Z

Purpose

Support CPU KV offloading with XPU swap_blocks kernel on XPU platform

Test Plan

pytest -s -v tests/v1/kv_offload
pytest -s -v tests/v1/kv_connector/unit/offloading_connector/test_worker.py

Test Result

Qwen-0.6B latency configuration:
Command: vllm bench latency --model=meta-llama/Llama-3.1-8B -tp 2

Median latency

Configuration	eager	compile
--kv_transfer_config={"kv_connector": "OffloadingConnector", "kv_role": "kv_both", "kv_connector_extra_config": {"cpu_bytes_to_use": 524288000, "block_size": 64}}	5.58920s	5.57366s
default	5.20960s	5.17703s

lm_eval:

with kv offloading

from lm_eval import evaluator
from lm_eval.models.vllm_causallms import VLLM
from vllm.config import KVEventsConfig, KVTransferConfig

if __name__ == '__main__':
    kv_transfer_config = KVTransferConfig(
        kv_connector="OffloadingConnector",
        kv_role="kv_both",
        kv_connector_extra_config={
            "cpu_bytes_to_use": 500 << 20,
            "block_size": 64,
        },
    )

    model = VLLM(
        pretrained="meta-llama/Llama-3.1-8B",
        dtype="bfloat16",
        tensor_parallel_size=2,
        add_bos_token=True,
        trust_remote_code=True,
        kv_transfer_config=kv_transfer_config,
    )

    results = evaluator.simple_evaluate(
        model=model,
        tasks=["gsm8k"],
        num_fewshot=5,
        limit=250,
        batch_size=20,
    )
    print(results["results"])

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.536	±	0.0316
	3	strict-match	5	exact_match	↑	0.536	±	0.0316

without kv offloading

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.528	±	0.0316
	3	strict-match	5	exact_match	↑	0.528	±	0.0316

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces support for CPU KV offloading on the XPU platform. The changes primarily involve making the existing CUDA offloading logic and tests device-agnostic by using current_platform.device_type and conditional logic for XPU-specific calls. A new CpuXpuOffloadingHandlers class is introduced, which cleverly reuses the CpuGpuOffloadingHandlers logic by monkey-patching torch.cuda functions with their torch.xpu counterparts within a context manager.

However, I've identified a critical issue in the implementation of the _torch_cuda_wrapper context manager in the new vllm/v1/kv_offload/worker/cpu_xpu.py file. The monkey-patching of torch.cuda attributes is not reverted in the finally block. This can lead to persistent, unintended side effects across the application, potentially causing hard-to-debug issues in other parts of the code that expect the original torch.cuda behavior. I've provided a code suggestion to fix this by properly restoring the original attributes.

vllm/v1/kv_offload/worker/cpu_xpu.py

mergify · 2026-03-12T16:57:33Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chaojun-zhang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2026-03-13T02:07:55Z

Documentation preview: https://vllm--36423.org.readthedocs.build/en/36423/

mergify · 2026-03-13T08:15:54Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chaojun-zhang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2026-04-01T05:01:08Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chaojun-zhang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2026-04-08T11:50:55Z

Hi @chaojun-zhang, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

zhenwei-intel · 2026-04-09T07:12:44Z

There are some newly added tests related to KV offloading. https://github.com/vllm-project/vllm/tree/main/tests/v1/kv_connector/unit/offloading_connector
Can we add them?

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>

chaojun-zhang · 2026-04-10T04:09:56Z

this pr depends on vllm-project/vllm-xpu-kernels#265

chaojun-zhang · 2026-04-10T04:10:35Z

There are some newly added tests related to KV offloading. https://github.com/vllm-project/vllm/tree/main/tests/v1/kv_connector/unit/offloading_connector Can we add them?

added

mergify bot added the v1 label Mar 9, 2026

gemini-code-assist bot reviewed Mar 9, 2026

View reviewed changes

vllm/v1/kv_offload/worker/cpu_xpu.py Outdated Show resolved Hide resolved

chaojun-zhang marked this pull request as ready for review March 11, 2026 06:58

chaojun-zhang requested review from ApostaC, LucasWilkinson, MatthewBonanni, WoosukKwon, mgoin, orozery, tlrmchlsmth and yewentao256 as code owners March 11, 2026 06:58

mergify bot added the needs-rebase label Mar 12, 2026

chaojun-zhang force-pushed the cpu_offload branch from 62e572a to 4bf0d65 Compare March 13, 2026 02:06

mergify bot added documentation Improvements or additions to documentation and removed needs-rebase labels Mar 13, 2026

chaojun-zhang force-pushed the cpu_offload branch 3 times, most recently from c8a24ba to c58e02f Compare March 13, 2026 02:12

mergify bot added the needs-rebase label Mar 13, 2026

chaojun-zhang force-pushed the cpu_offload branch from c58e02f to 707d15d Compare March 29, 2026 14:39

mergify bot added intel-gpu Related to Intel GPU and removed needs-rebase labels Mar 29, 2026

mergify bot added the needs-rebase label Apr 1, 2026

chaojun-zhang force-pushed the cpu_offload branch from 707d15d to 0586393 Compare April 8, 2026 01:44

mergify bot removed the needs-rebase label Apr 8, 2026

chaojun-zhang mentioned this pull request Apr 8, 2026

Enable CPU_Offload on XPU #39250

Open

chaojun-zhang force-pushed the cpu_offload branch from 0586393 to fa0a8b2 Compare April 8, 2026 11:47

chaojun-zhang force-pushed the cpu_offload branch 3 times, most recently from dca4c3d to f0403d4 Compare April 8, 2026 12:27

chaojun-zhang force-pushed the cpu_offload branch from f0403d4 to ba455e7 Compare April 9, 2026 12:11

mergify bot added the kv-connector label Apr 9, 2026

[Feature] Support CPU offloading on XPU platform

f392c06

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>

chaojun-zhang force-pushed the cpu_offload branch from ba455e7 to f392c06 Compare April 10, 2026 04:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[XPU] Support cpu kv offloading on XPU platform#36423

[XPU] Support cpu kv offloading on XPU platform#36423
chaojun-zhang wants to merge 1 commit intovllm-project:mainfrom
chaojun-zhang:cpu_offload

chaojun-zhang commented Mar 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

mergify bot commented Mar 12, 2026

Uh oh!

mergify bot commented Mar 13, 2026

Uh oh!

mergify bot commented Mar 13, 2026

Uh oh!

mergify bot commented Apr 1, 2026

Uh oh!

mergify bot commented Apr 8, 2026

Uh oh!

zhenwei-intel commented Apr 9, 2026

Uh oh!

chaojun-zhang commented Apr 10, 2026

Uh oh!

chaojun-zhang commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chaojun-zhang commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify bot commented Mar 12, 2026

Uh oh!

mergify bot commented Mar 13, 2026

Uh oh!

mergify bot commented Mar 13, 2026

Uh oh!

mergify bot commented Apr 1, 2026

Uh oh!

mergify bot commented Apr 8, 2026

Uh oh!

zhenwei-intel commented Apr 9, 2026

Uh oh!

chaojun-zhang commented Apr 10, 2026

Uh oh!

chaojun-zhang commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chaojun-zhang commented Mar 9, 2026 •

edited

Loading