Add swap_blocks_batch op with batched async memcpy by chaojun-zhang · Pull Request #265 · vllm-project/vllm-xpu-kernels

chaojun-zhang · 2026-04-10T04:04:36Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

Add a batch version of swap_blocks that copies N independent (src_ptr, dst_ptr, size) triples in a single call, modeled after the CUDA swap_blocks_batch (which uses cuMemcpyBatchAsync).

Test Plan

pytest -s -v tests/tests_cache.py::test_swap_blocks_batch_h2d_mutation_race
pytest -s -v tests/tests_cache.py::test_swap_blocks_batch

Test Result

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Copilot

Pull request overview

This PR introduces a new swap_blocks_batch cache op for XPU that performs a batch of independent (src_ptr, dst_ptr, size) async copies in one call, with a staging-buffer snapshot for H2D to avoid post-call host mutation races.

Changes:

Add swap_blocks_batch Torch op + C++ wrapper (csrc/cache.cpp, csrc/torch_bindings.cpp, csrc/ops.h).
Add xpuAsyncMemcpyBatch implementation to perform batched async copies with optional H2D staging (csrc/utils/mem_cpy.*).
Add Python test coverage for batched swaps and a H2D mutation-race regression test (tests/test_cache.py) plus a Python wrapper (tests/register_ops.py).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tests/test_cache.py	Adds new tests and helpers for `swap_blocks_batch`, including a mutation-race check.
tests/register_ops.py	Adds a Python wrapper for the new Torch op.
csrc/utils/mem_cpy.h	Declares `xpuAsyncMemcpyBatch` API.
csrc/utils/mem_cpy.cpp	Implements batched async memcpy with H2D staging.
csrc/torch_bindings.cpp	Registers the new `swap_blocks_batch` op in the cache ops library.
csrc/ops.h	Exposes the new op signature in the public header.
csrc/cache.cpp	Implements `swap_blocks_batch` input validation and forwards to the memcpy helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

csrc/utils/mem_cpy.cpp

csrc/cache.cpp

csrc/torch_bindings.cpp

tests/test_cache.py

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>

Copilot AI review requested due to automatic review settings April 10, 2026 04:04

Copilot started reviewing on behalf of chaojun-zhang April 10, 2026 04:05 View session

chaojun-zhang force-pushed the op_swap_blocks_batch branch from 775df35 to 65e047e Compare April 10, 2026 04:07

Copilot AI reviewed Apr 10, 2026

View reviewed changes

chaojun-zhang mentioned this pull request Apr 10, 2026

[XPU] Support cpu kv offloading on XPU platform vllm-project/vllm#36423

Open

5 tasks

chaojun-zhang force-pushed the op_swap_blocks_batch branch 3 times, most recently from dd8018c to 66a2859 Compare April 10, 2026 06:24

mayuyuace approved these changes Apr 10, 2026

View reviewed changes

Add swap_blocks_batch op with batched async memcpy

728eaa9

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>

chaojun-zhang force-pushed the op_swap_blocks_batch branch from 66a2859 to 728eaa9 Compare April 10, 2026 06:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add swap_blocks_batch op with batched async memcpy#265

Add swap_blocks_batch op with batched async memcpy#265
chaojun-zhang wants to merge 1 commit intovllm-project:mainfrom
chaojun-zhang:op_swap_blocks_batch

chaojun-zhang commented Apr 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chaojun-zhang commented Apr 10, 2026

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants