[GPU] xattention_block_size 256 support. #33485

ceciliapeng2011 · 2026-01-07T06:25:16Z

Details:

[GPU] Extend XAttention to support block sizes 128 and 256.

Tickets:

CVS-175434

…ed with float precision to avoid an onverflow zp.

riverlijunjie

Some minor comments, totally LGTM.

riverlijunjie · 2026-01-08T05:22:30Z

src/plugins/intel_gpu/src/graph/impls/cm/include/cm_pa_common.hpp

    svmptr_t sparse_mask_base [[type("svmptr_t")]],
    svmptr_t wg_sparse_mask_base [[type("svmptr_t")]],
-    bool validate,
+    int SPARSE_BLOCK_SIZE,


Do we really need this parameter if it is a macro?

Yes, we need. SPARSE_BLOCK_SIZE is a runtime parameter now for PA kernel, instead of a compile time jit const.

riverlijunjie · 2026-01-08T05:31:31Z

src/plugins/intel_gpu/src/graph/impls/cm/paged_attention.cpp

-                res_event = {execute_stage(res_event, instance, xattn_estimate_find_block)};
-                res_event = {execute_stage(res_event, instance, xattn_estimate_post_proc)};
+            if (!bypass_xattn(params)) {
+                if (rt_params->xattn_block_size == 128) {


If xattn_block_size is fixed value, we don't need add_stage for both 128 and 256.

Unfortunately xattn_block_size is a compile time jit const for xattention kernels, while it is also a runtime parameter of model with PA node. This means users can dynamically switch it from time to time during inferencing. So this PR has to create two stages (one for 128, the other for 256) to switch in fly, accordingly.

ceciliapeng2011 added 2 commits January 4, 2026 03:06

fix QWen3-32B int8 model accuracy issue: scale_val should be calculat…

d43b10b

…ed with float precision to avoid an onverflow zp.

[GPU] xattention_block_size 256 support.

a65ee0d

ceciliapeng2011 requested review from a team as code owners January 7, 2026 06:25

ceciliapeng2011 requested a review from riverlijunjie January 7, 2026 06:25

github-actions bot added the category: GPU OpenVINO GPU plugin label Jan 7, 2026

peterchen-intel assigned isanghao Jan 8, 2026

peterchen-intel added the Code Freeze label Jan 8, 2026

peterchen-intel added this to the 2026.0 milestone Jan 8, 2026

riverlijunjie approved these changes Jan 8, 2026

View reviewed changes

fix clang-format issues.

9e0d005

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GPU] xattention_block_size 256 support. #33485

[GPU] xattention_block_size 256 support. #33485

ceciliapeng2011 commented Jan 7, 2026 •

edited by peterchen-intel

Loading

Uh oh!

riverlijunjie left a comment

Uh oh!

riverlijunjie Jan 8, 2026

Uh oh!

ceciliapeng2011 Jan 8, 2026

Uh oh!

riverlijunjie Jan 8, 2026

Uh oh!

ceciliapeng2011 Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[GPU] xattention_block_size 256 support. #33485

Are you sure you want to change the base?

[GPU] xattention_block_size 256 support. #33485

Conversation

ceciliapeng2011 commented Jan 7, 2026 • edited by peterchen-intel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Tickets:

Uh oh!

riverlijunjie left a comment

Choose a reason for hiding this comment

Uh oh!

riverlijunjie Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

ceciliapeng2011 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

riverlijunjie Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

ceciliapeng2011 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ceciliapeng2011 commented Jan 7, 2026 •

edited by peterchen-intel

Loading