split attention template via data types by xinyu-intel · Pull Request #270 · vllm-project/vllm-xpu-kernels

xinyu-intel · 2026-04-13T07:20:46Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

Test Plan

Test Result

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Copilot

Pull request overview

This PR refactors the Xe2 XPU attention (chunk prefill + paged decode) kernel instantiation and dispatch to be specialized by explicit Q/KV/O data types, rather than relying on runtime dtype branching inside the instantiated templates.

Changes:

Introduces CutlassQKOType and aten_to_Cutlass_qko_dtype() to carry Q/K/O dtypes through the dispatch path.
Splits the dispatch into a typed implementation (*_dispatch_typed_impl<..., ElementQ, ElementKV, ElementO, ...>) plus a runtime dtype switch that calls into those typed instantiations.
Updates CMake/kernel templates/extern declarations to generate and declare explicit instantiations for the allowed dtype combinations.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
csrc/xpu/attn/xe_2/paged_decode.hpp	Adds typed dispatch helper and updates runtime dtype dispatch to include O dtype.
csrc/xpu/attn/xe_2/paged_decode_xe2.cpp	Switches paged decode to use Q/K/O dtype triplet in runtime dispatch.
csrc/xpu/attn/xe_2/paged_decode_utils.hpp	Propagates `CutlassQKOType` through the decode dispatch helpers.
csrc/xpu/attn/xe_2/paged_decode_kernel_template.cpp.in	Updates generated instantiation to target the typed dispatch function and dtype macros.
csrc/xpu/attn/xe_2/paged_decode_extern.hpp	Expands extern template declarations across allowed dtype combinations.
csrc/xpu/attn/xe_2/paged_decode_configure.cmake	Generates kernel sources across (dtype combo × bool flags × policy variants).
csrc/xpu/attn/xe_2/fmha_xe2.cpp	Switches chunk prefill to Q/K/O dtype triplet dispatch.
csrc/xpu/attn/xe_2/fmha_utils.hpp	Replaces Q/K dtype pair with Q/K/O dtype triplet utilities/types.
csrc/xpu/attn/xe_2/chunk_prefill.hpp	Adds typed dispatch helper and updates runtime dtype dispatch to include O dtype.
csrc/xpu/attn/xe_2/chunk_prefill_utils.hpp	Propagates `CutlassQKOType` through the chunk prefill dispatch helpers.
csrc/xpu/attn/xe_2/chunk_prefill_kernel_template.cpp.in	Updates generated instantiation to target the typed dispatch function and dtype macros.
csrc/xpu/attn/xe_2/chunk_prefill_extern.hpp	Expands extern template declarations across allowed dtype combinations.
csrc/xpu/attn/xe_2/chunk_prefill_configure.cmake	Generates chunk-prefill kernel sources across dtype combinations and bool permutations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+      } else if (cuQKOType.k_type == CutlassDType::float8_e5m2) {
+        return decode_policy_dispatch_typed_impl<
+            decode_policy,
+            bfloat16_t,
+            float_e5m2_t,


+  if (cuQKOType.o_type == CutlassDType::half) {
+    if (cuQKOType.q_type == CutlassDType::half) {
+      if (cuQKOType.k_type == CutlassDType::half) {


+  # Allowed dtype combinations must match runtime dispatch constraints. Format:
+  # Q_TYPE|KV_TYPE|O_TYPE|FILE_TAG
+  set(dtype_combo_list
+      "half_t|half_t|half_t|h_h_h"
+      "half_t|float_e4m3_t|half_t|h_e4_h"
+      "half_t|float_e5m2_t|half_t|h_e5_h"
+      "bfloat16_t|bfloat16_t|bfloat16_t|b_b_b"
+      "bfloat16_t|float_e4m3_t|bfloat16_t|b_e4_b"
+      "bfloat16_t|float_e5m2_t|bfloat16_t|b_e5_b")


+        TORCH_CHECK(false, "Unsupported KV dtype for chunk prefill dispatch");
+      }


xinyu-intel · 2026-04-21T07:43:24Z

@baodii @YizhouZ @jikunshang pls review, it breaks down dtypes and will be helpful for further development.

Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>

Copilot AI review requested due to automatic review settings April 13, 2026 07:20

Copilot started reviewing on behalf of xinyu-intel April 13, 2026 07:21 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

xinyu-intel force-pushed the dev/split-attn-template branch from addd992 to 3dca5b1 Compare April 21, 2026 04:24

split attention template via data types

3dca5b1

Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split attention template via data types#270

split attention template via data types#270
xinyu-intel wants to merge 1 commit intovllm-project:mainfrom
xinyu-intel:dev/split-attn-template

xinyu-intel commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

xinyu-intel commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		TORCH_CHECK(false, "Unsupported KV dtype for chunk prefill dispatch");
		}

Conversation

xinyu-intel commented Apr 13, 2026

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

xinyu-intel commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants