Skip to content

split attention template via data types#270

Open
xinyu-intel wants to merge 1 commit intovllm-project:mainfrom
xinyu-intel:dev/split-attn-template
Open

split attention template via data types#270
xinyu-intel wants to merge 1 commit intovllm-project:mainfrom
xinyu-intel:dev/split-attn-template

Conversation

@xinyu-intel
Copy link
Copy Markdown
Collaborator

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

Test Plan

Test Result

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Copilot AI review requested due to automatic review settings April 13, 2026 07:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Xe2 XPU attention (chunk prefill + paged decode) kernel instantiation and dispatch to be specialized by explicit Q/KV/O data types, rather than relying on runtime dtype branching inside the instantiated templates.

Changes:

  • Introduces CutlassQKOType and aten_to_Cutlass_qko_dtype() to carry Q/K/O dtypes through the dispatch path.
  • Splits the dispatch into a typed implementation (*_dispatch_typed_impl<..., ElementQ, ElementKV, ElementO, ...>) plus a runtime dtype switch that calls into those typed instantiations.
  • Updates CMake/kernel templates/extern declarations to generate and declare explicit instantiations for the allowed dtype combinations.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
csrc/xpu/attn/xe_2/paged_decode.hpp Adds typed dispatch helper and updates runtime dtype dispatch to include O dtype.
csrc/xpu/attn/xe_2/paged_decode_xe2.cpp Switches paged decode to use Q/K/O dtype triplet in runtime dispatch.
csrc/xpu/attn/xe_2/paged_decode_utils.hpp Propagates CutlassQKOType through the decode dispatch helpers.
csrc/xpu/attn/xe_2/paged_decode_kernel_template.cpp.in Updates generated instantiation to target the typed dispatch function and dtype macros.
csrc/xpu/attn/xe_2/paged_decode_extern.hpp Expands extern template declarations across allowed dtype combinations.
csrc/xpu/attn/xe_2/paged_decode_configure.cmake Generates kernel sources across (dtype combo × bool flags × policy variants).
csrc/xpu/attn/xe_2/fmha_xe2.cpp Switches chunk prefill to Q/K/O dtype triplet dispatch.
csrc/xpu/attn/xe_2/fmha_utils.hpp Replaces Q/K dtype pair with Q/K/O dtype triplet utilities/types.
csrc/xpu/attn/xe_2/chunk_prefill.hpp Adds typed dispatch helper and updates runtime dtype dispatch to include O dtype.
csrc/xpu/attn/xe_2/chunk_prefill_utils.hpp Propagates CutlassQKOType through the chunk prefill dispatch helpers.
csrc/xpu/attn/xe_2/chunk_prefill_kernel_template.cpp.in Updates generated instantiation to target the typed dispatch function and dtype macros.
csrc/xpu/attn/xe_2/chunk_prefill_extern.hpp Expands extern template declarations across allowed dtype combinations.
csrc/xpu/attn/xe_2/chunk_prefill_configure.cmake Generates chunk-prefill kernel sources across dtype combinations and bool permutations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +595 to +599
} else if (cuQKOType.k_type == CutlassDType::float8_e5m2) {
return decode_policy_dispatch_typed_impl<
decode_policy,
bfloat16_t,
float_e5m2_t,
Comment on lines +354 to +356
if (cuQKOType.o_type == CutlassDType::half) {
if (cuQKOType.q_type == CutlassDType::half) {
if (cuQKOType.k_type == CutlassDType::half) {
Comment on lines +69 to +77
# Allowed dtype combinations must match runtime dispatch constraints. Format:
# Q_TYPE|KV_TYPE|O_TYPE|FILE_TAG
set(dtype_combo_list
"half_t|half_t|half_t|h_h_h"
"half_t|float_e4m3_t|half_t|h_e4_h"
"half_t|float_e5m2_t|half_t|h_e5_h"
"bfloat16_t|bfloat16_t|bfloat16_t|b_b_b"
"bfloat16_t|float_e4m3_t|bfloat16_t|b_e4_b"
"bfloat16_t|float_e5m2_t|bfloat16_t|b_e5_b")
Comment on lines +387 to +388
TORCH_CHECK(false, "Unsupported KV dtype for chunk prefill dispatch");
}
@xinyu-intel xinyu-intel force-pushed the dev/split-attn-template branch from addd992 to 3dca5b1 Compare April 21, 2026 04:24
@xinyu-intel
Copy link
Copy Markdown
Collaborator Author

@baodii @YizhouZ @jikunshang pls review, it breaks down dtypes and will be helpful for further development.

Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants