[MLA] add `merge_attn_states` sycl kernel #64

jikunshang · 2025-11-10T10:04:34Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

This PR add merge_attn_states kernel , which will be used in MLA chunked prefill stage.

Test Plan

UT&CI

Test Result

pass

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Kunshang Ji <[email protected]>

Copilot

Pull Request Overview

This PR adds a SYCL kernel implementation for the merge_attn_states operation, which is used to combine partial attention results during the MLA (Multi-Head Latent Attention) chunked prefill stage. The implementation follows section 2.2 of the referenced paper (https://www.arxiv.org/pdf/2501.01005).

Key Changes:

Implements merge_attn_states SYCL kernel with FP32, FP16, and BF16 support
Adds comprehensive test coverage with performance benchmarking
Integrates the kernel into the build system and torch bindings

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
csrc/attention/merge_attn_states.cpp	Core SYCL kernel implementation for merging attention states with 128-bit packed operations
csrc/ops.h	Function declaration for merge_attn_states
csrc/torch_bindings.cpp	PyTorch C++ extension binding registration
csrc/utils.h	Helper functions for type conversion between float and half/bfloat16
tests/test_merge_attn_states.py	Comprehensive test suite with PyTorch reference implementation and performance comparison
tests/register_ops.py	Python wrapper for the SYCL kernel operation
CMakeLists.txt	Adds new source file to build configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-10T10:05:23Z

csrc/attention/merge_attn_states.cpp

+ * into the output tensor. NUM_TOKENS: n, NUM_HEADS: h, HEAD_SIZE: d
+ *
+ * @param output [n,h,d] The output tensor to store the merged attention states.
+ * @param output_lse [h,d] Optional tensor to store the log-sum-exp values.


The dimension description for output_lse is incorrect. According to the code (line 90) and test file (line 141), output_lse should be [h,n] not [h,d].

Suggested change

* @param output_lse [h,d] Optional tensor to store the log-sum-exp values.

* @param output_lse [h,n] Optional tensor to store the log-sum-exp values.

Copilot · 2025-11-10T10:05:24Z

tests/test_merge_attn_states.py

+logger = logging.getLogger("vllm_xpu_kernel")
+
+
+# Naive PyTorch Implements section 2.2 of https://www.arxiv.org/pdf/2501.01005


Corrected spelling of 'Implements' to 'Implementation of'.

Suggested change

# Naive PyTorch Implements section 2.2 of https://www.arxiv.org/pdf/2501.01005

# Naive PyTorch Implementation of section 2.2 of https://www.arxiv.org/pdf/2501.01005

Copilot · 2025-11-10T10:05:24Z

tests/test_merge_attn_states.py

+
+all_case_info: list[tuple] = []
+
+#override pytest parameters when enable mini pytest


Missing article in comment. Should be 'when enabling mini pytest' or 'when mini pytest is enabled'.

Suggested change

#override pytest parameters when enable mini pytest

# override pytest parameters when enabling mini pytest

Copilot · 2025-11-10T10:05:25Z

tests/test_merge_attn_states.py

+    if output_lse is not None:
+        output_lse = torch.log(out_se) + max_lse


The output_lse parameter is reassigned but never returned or used. This local assignment has no effect on the caller. The corrected value should be stored before returning it at line 49.

Copilot AI review requested due to automatic review settings November 10, 2025 10:04

jikunshang added 3 commits November 10, 2025 02:04

add merge_attn_states

c8302a8

Signed-off-by: Kunshang Ji <[email protected]>

fix

8851e1b

Signed-off-by: Kunshang Ji <[email protected]>

format

4edf4ed

Signed-off-by: Kunshang Ji <[email protected]>

jikunshang force-pushed the kunshang/mla_kernels branch from 26b5133 to 4edf4ed Compare November 10, 2025 10:05

Copilot AI reviewed Nov 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MLA] add `merge_attn_states` sycl kernel #64

[MLA] add `merge_attn_states` sycl kernel #64

Uh oh!

jikunshang commented Nov 10, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	* @param output_lse [h,d] Optional tensor to store the log-sum-exp values.
	* @param output_lse [h,n] Optional tensor to store the log-sum-exp values.

		logger = logging.getLogger("vllm_xpu_kernel")


		# Naive PyTorch Implements section 2.2 of https://www.arxiv.org/pdf/2501.01005

	# Naive PyTorch Implements section 2.2 of https://www.arxiv.org/pdf/2501.01005
	# Naive PyTorch Implementation of section 2.2 of https://www.arxiv.org/pdf/2501.01005


		all_case_info: list[tuple] = []

		#override pytest parameters when enable mini pytest

	#override pytest parameters when enable mini pytest
	# override pytest parameters when enabling mini pytest

		if output_lse is not None:
		output_lse = torch.log(out_se) + max_lse

[MLA] add merge_attn_states sycl kernel #64

Are you sure you want to change the base?

[MLA] add merge_attn_states sycl kernel #64

Uh oh!

Conversation

jikunshang commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[MLA] add `merge_attn_states` sycl kernel #64

[MLA] add `merge_attn_states` sycl kernel #64

jikunshang commented Nov 10, 2025 •

edited

Loading