Skip to content

Fix sparse mask handling in softmax kernel#33814

Merged
maxnick merged 4 commits intoopenvinotoolkit:masterfrom
mangguo321:mang/fix_softmax_sparse
Feb 4, 2026
Merged

Fix sparse mask handling in softmax kernel#33814
maxnick merged 4 commits intoopenvinotoolkit:masterfrom
mangguo321:mang/fix_softmax_sparse

Conversation

@mangguo321
Copy link
Contributor

@mangguo321 mangguo321 commented Jan 26, 2026

Details:

  • Fix sparse mask handling in softmax kernel. In the sparse attention path, the sparse mask caused some blocks to be skipped, so those blocks are not written by the GEMM kernel, as a result, the corresponding regions in the output buffer remain uninitialized and their contents may decode to NAN/Inf values.
  • In this PR, we overwrite the skipped regions with -FLT_MAX to prevent NaN propagation and avoid incorrect computations in downstream kernels

Tickets:

@mangguo321 mangguo321 requested review from a team as code owners January 26, 2026 09:05
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Jan 26, 2026
@rkazants rkazants added the pr: needs tests PR needs tests updating label Jan 27, 2026
Copy link
Collaborator

@rkazants rkazants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please implement tests

@rkazants rkazants requested a review from maxnick January 27, 2026 04:56
@mangguo321
Copy link
Contributor Author

please implement tests

We did some performance and accuracy test, the results can be found in this JIRA ticket.https://jira.devtools.intel.com/browse/CVS-179625

Copy link
Contributor

@liubo-intel liubo-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @mangguo321 : from my understanding, your changes are to fix NaN data issue.
for Xattention cases, these changes make sense to skip the sparse block, and LGTM.
but I think when we have time, better to find out why these v_a/a[i] contain NaN data? Since the v_a/a[i] values serve as input data for this kernel, they are expected to be finite under normal conditions, unless there was a computational error during the previous calculation or a mistake during data loading.

@mangguo321
Copy link
Contributor Author

Hi @liubo-intel The input to softmax kernel is the output of QK GEMM. In the sparse attention path, the sparse mask caused some blocks to be skipped, so those blocks are not written by the GEMM kernel, as a result, the corresponding regions in the output buffer remain uninitialized and their contents may decode to NAN/Inf values.

@rkazants
Copy link
Collaborator

No appropriate PR description, no JIRA ticket, no tests

@mangguo321
Copy link
Contributor Author

mangguo321 commented Jan 27, 2026

No appropriate PR description, no JIRA ticket, no tests

Hi @rkazants We updated the description with PR details. The JIRA ticket is already referenced in the description. We tested this change with qwen2-7b-instruct and llama-3.2-3b-instruct, the accuracy issue reported in the ticket is resolved and no performance regression was observed. The test results is in the JIRA ticket. Please let me know if any additional information is needed.

@mangguo321
Copy link
Contributor Author

Hi @maxnick, could you please take a review? Thanks!

Copy link
Contributor

@zhangYiIntel zhangYiIntel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, NaN + anything still equals NaN, the setting approach is much better!

@maxnick maxnick added this to the 2026.1 milestone Jan 29, 2026
@maxnick
Copy link
Contributor

maxnick commented Jan 29, 2026

@mangguo321 , could you please cover your changes with a single layer tests via extending the existing test configurations or developing a new one?

@mangguo321
Copy link
Contributor Author

@mangguo321 , could you please cover your changes with a single layer tests via extending the existing test configurations or developing a new one?

The softmax kernel unit test was added to cover the code changes in this PR.

@maxnick maxnick removed the pr: needs tests PR needs tests updating label Jan 30, 2026
@maxnick
Copy link
Contributor

maxnick commented Jan 30, 2026

@rkazants , the dedicated unit tests were added.

@github-actions github-actions bot added the category: build OpenVINO cmake script / infra label Feb 1, 2026
@mangguo321
Copy link
Contributor Author

@rkazants @maxnick If there are no further concerns, could you please remove the "do not merge" label? This PR is required to address the XAttention accuracy issue. Thanks a lot!

@rkazants rkazants dismissed their stale review February 4, 2026 10:10

no more concern rearding PR description and tests

@maxnick maxnick added this pull request to the merge queue Feb 4, 2026
Merged via the queue into openvinotoolkit:master with commit 65b105a Feb 4, 2026
234 of 236 checks passed
insoow pushed a commit to insoow/openvino that referenced this pull request Feb 9, 2026
### Details:
- *Fix sparse mask handling in softmax kernel. In the sparse attention
path, the sparse mask caused some blocks to be skipped, so those blocks
are not written by the GEMM kernel, as a result, the corresponding
regions in the output buffer remain uninitialized and their contents may
decode to NAN/Inf values.*
- *In this PR, we overwrite the skipped regions with -FLT_MAX to prevent
NaN propagation and avoid incorrect computations in downstream kernels*

### Tickets:
 - *[CVS-179625](https://jira.devtools.intel.com/browse/CVS-179625)*
Naseer-010 pushed a commit to Naseer-010/openvino that referenced this pull request Feb 18, 2026
### Details:
- *Fix sparse mask handling in softmax kernel. In the sparse attention
path, the sparse mask caused some blocks to be skipped, so those blocks
are not written by the GEMM kernel, as a result, the corresponding
regions in the output buffer remain uninitialized and their contents may
decode to NAN/Inf values.*
- *In this PR, we overwrite the skipped regions with -FLT_MAX to prevent
NaN propagation and avoid incorrect computations in downstream kernels*

### Tickets:
 - *[CVS-179625](https://jira.devtools.intel.com/browse/CVS-179625)*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: build OpenVINO cmake script / infra category: CPU OpenVINO CPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants