[GPU] some fixes and optimizations to CM PA and XAttention kernels #33454

ceciliapeng2011 · 2026-01-04T03:17:53Z

Improve KVCache quantization, XAttention flexibility, and sparse attention performance.

Details:

Use float as internal precision for KVCache quantization in kvcache_update CM kernel to fix accuracy issues in QWen3-32B int8 model.
Remove restriction in PA 2nd token CM kernel that limited heads_num / kv_heads_num <= 8, resolving MiniCPM4 failure.
Fix phi-3-mini-128k-instruct issue caused by head_size=96 not divisible by 64 in xattention_gemm_qk kernel.
Optimize sparse attention with fp16 kvcache when sparsity is small.

Tickets:

CVS-178816
CVS-178638

…ed with float precision to avoid an onverflow zp.

fix QWen3-32B int8 model accuracy issue: scale_val should be calculat…

d43b10b

…ed with float precision to avoid an onverflow zp.

ceciliapeng2011 requested review from a team as code owners January 4, 2026 03:17

github-actions bot added the category: GPU OpenVINO GPU plugin label Jan 4, 2026

ceciliapeng2011 marked this pull request as draft January 4, 2026 03:18

ceciliapeng2011 requested a review from riverlijunjie January 4, 2026 03:18

ceciliapeng2011 changed the title ~~fix QWen3-32B int8 model accuracy issue: scale_val should be calculat…~~ [GPU] some fixes and optimizations to CM PA and XAttention kernels Jan 4, 2026

riverlijunjie approved these changes Jan 4, 2026

View reviewed changes

CM_PA 2nd token: support arbitray ratio of num_q_heads/num_kv_heads.

aa6cd82

ceciliapeng2011 requested a review from riverlijunjie January 8, 2026 07:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GPU] some fixes and optimizations to CM PA and XAttention kernels #33454

[GPU] some fixes and optimizations to CM PA and XAttention kernels #33454

ceciliapeng2011 commented Jan 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[GPU] some fixes and optimizations to CM PA and XAttention kernels #33454

Are you sure you want to change the base?

[GPU] some fixes and optimizations to CM PA and XAttention kernels #33454

Conversation

ceciliapeng2011 commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Tickets:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ceciliapeng2011 commented Jan 4, 2026 •

edited

Loading