[XPU] Support cpu kv offloading on XPU platform#36423
[XPU] Support cpu kv offloading on XPU platform#36423chaojun-zhang wants to merge 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for CPU KV offloading on the XPU platform. The changes primarily involve making the existing CUDA offloading logic and tests device-agnostic by using current_platform.device_type and conditional logic for XPU-specific calls. A new CpuXpuOffloadingHandlers class is introduced, which cleverly reuses the CpuGpuOffloadingHandlers logic by monkey-patching torch.cuda functions with their torch.xpu counterparts within a context manager.
However, I've identified a critical issue in the implementation of the _torch_cuda_wrapper context manager in the new vllm/v1/kv_offload/worker/cpu_xpu.py file. The monkey-patching of torch.cuda attributes is not reverted in the finally block. This can lead to persistent, unintended side effects across the application, potentially causing hard-to-debug issues in other parts of the code that expect the original torch.cuda behavior. I've provided a code suggestion to fix this by properly restoring the original attributes.
|
This pull request has merge conflicts that must be resolved before it can be |
62e572a to
4bf0d65
Compare
|
Documentation preview: https://vllm--36423.org.readthedocs.build/en/36423/ |
c8a24ba to
c58e02f
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
c58e02f to
707d15d
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
707d15d to
0586393
Compare
0586393 to
fa0a8b2
Compare
|
Hi @chaojun-zhang, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
dca4c3d to
f0403d4
Compare
|
There are some newly added tests related to KV offloading. https://github.com/vllm-project/vllm/tree/main/tests/v1/kv_connector/unit/offloading_connector |
f0403d4 to
ba455e7
Compare
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
ba455e7 to
f392c06
Compare
|
this pr depends on vllm-project/vllm-xpu-kernels#265 |
added |
Purpose
Support CPU KV offloading with XPU swap_blocks kernel on XPU platform
Test Plan
pytest -s -v tests/v1/kv_offload
pytest -s -v tests/v1/kv_connector/unit/offloading_connector/test_worker.py
Test Result
Qwen-0.6B latency configuration:
Command: vllm bench latency --model=meta-llama/Llama-3.1-8B -tp 2
Median latency
lm_eval:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.