Fix silent data corruption in JIT eltwise kernel for i8/u8 bitwise ops with broadcast by goyaladitya05 · Pull Request #34639 · openvinotoolkit/openvino

goyaladitya05 · 2026-03-11T19:09:26Z

Fixed a data corruption bug in load_vector() where broadcasting i8/u8 values during bitwise operations produced incorrect results.

Cause

load_vector() has two broadcast paths based on whether src_prc == dst_prc:

src_prc != dst_prc: calls load_scalar to widen the value to 32 bits first, then broadcasts with uni_vbroadcastss - correct, the value is already 32-bit by the time it is broadcast.
src_prc == dst_prc: also called uni_vbroadcastss unconditionally - wrong for 8-bit types.

vbroadcastss copies 4 bytes at a time. For an i8 value, only byte 0 of each 4-byte lane gets the scalar; the other 3 bytes are zeroed. In a 256-bit register that means 8 correct bytes and 24 zeros, so any bitwise AND/OR/XOR operating on those lanes silently produces wrong results.

Fix

In the src_prc == dst_prc branch, dispatch on src_prc.size() instead of always calling uni_vbroadcastss:

1 byte (i8/u8)
- AVX2+: vpbroadcastb - fills all byte lanes directly.
- SSE4.1: punpcklbw + punpcklbw + pshufd 0 - SSE has no byte-broadcast instruction; two unpacks interleave the byte with itself, then pshufd splats it across all dword lanes.
2 bytes
- AVX2+: vpbroadcastw.
- SSE4.1: punpcklwd + pshufd 0.
4 bytes (i32/f32): uni_vbroadcastss is unchanged.

Tests

Added smoke_CompareWithRefs_2D_Bitwise_i8u8_Broadcast to eltwise.cpp.

24 test cases: AND / OR / XOR × i8 / u8 × CONSTANT / PARAMETER secondary input.
Shapes: two pairs - {1,64} vs {1,1} and {32,256} vs {1,1} - from the bug report. Each pair runs inference twice (full shape, then the {1,1} broadcast operand) to exercise the fixed path.
2D only, no format constraints: unlike the existing 4D bitwise suite which tests nhwc/nchw layout permutations, 2D tensors have no channel-last layout so no CPUSpecificParams format is set and keeps the test focused purely on broadcast correctness.

Closes #34638

AI Assistance:

AI assistance used: yes
If yes, summarize how AI was used and what human validation was performed (build/tests/manual checks): Used Claude Sonnet 4.6 to help with pinpointing the location of bug, and fixes.
Built it locally, and verified everything works.

Copilot

Pull request overview

Fixes a silent data corruption issue in the Intel CPU plugin’s x64 JIT eltwise kernel when broadcasting i8/u8 scalars for bitwise ops, and adds a focused regression test to cover the broadcast scenario from the reported bug.

Changes:

Update jit_uni_eltwise_generic::load_vector() to use byte/word-aware broadcast for src_prc == dst_prc (avoids vbroadcastss for 8-bit types).
Add a new 2D bitwise broadcast instantiation to validate i8/u8 AND/OR/XOR correctness when one operand is {1,1}-broadcast.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`src/plugins/intel_cpu/src/nodes/kernels/x64/jit_uni_eltwise_generic.cpp`	Fixes scalar broadcast emission for 8-bit element types in the JIT load path used by bitwise ops.
`src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/instances/common/eltwise.cpp`	Adds a regression test suite covering i8/u8 bitwise ops with `{1,1}` broadcast in 2D shapes.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-12T07:29:48Z

src/plugins/intel_cpu/src/nodes/kernels/x64/jit_uni_eltwise_generic.cpp

+            case 2:
+                if (isa == x64::sse41) {
+                    punpcklwd(xmm_src, xmm_src);
+                    pshufd(xmm_src, xmm_src, 0);
+                } else {
+                    vpbroadcastw(vmm_src, xmm_src);
+                }
+                break;


[MEDIUM] load_vector() adds a 2-byte broadcast path (case 2), but this code calls load_scalar() first, and load_scalar() currently throws for src_prc == dst_prc with src_prc.size() == 2 (it only supports sizes 1 and 4 in that branch). As a result, the new 2-byte broadcast logic is effectively unreachable and any future attempt to broadcast u16/i16 without type conversion will still fail at runtime. Either add 2-byte support to load_scalar() for the src_prc == dst_prc case (load 16 bits and clear upper bits) or remove the case 2 handling here to avoid implying support that isn't actually implemented.

Suggested change

case 2:

if (isa == x64::sse41) {

punpcklwd(xmm_src, xmm_src);

pshufd(xmm_src, xmm_src, 0);

} else {

vpbroadcastw(vmm_src, xmm_src);

}

break;

maxnick · 2026-03-12T11:23:46Z

build_jenkins

correct byte broadcast in JIT eltwise kernel for i8/u8 bitwise ops

e2226b4

github-actions bot added the category: CPU OpenVINO CPU plugin label Mar 11, 2026

sys-openvino-ci added the ExternalPR External contributor label Mar 11, 2026

goyaladitya05 marked this pull request as ready for review March 12, 2026 07:23

goyaladitya05 requested review from a team as code owners March 12, 2026 07:23

Copilot AI review requested due to automatic review settings March 12, 2026 07:23

Copilot started reviewing on behalf of goyaladitya05 March 12, 2026 07:24 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

Merge branch 'master' into fix/jit_eltwise_bitwise_i8_broadcast

592163f

maxnick added this to the 2026.1 milestone Mar 12, 2026

maxnick approved these changes Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix silent data corruption in JIT eltwise kernel for i8/u8 bitwise ops with broadcast#34639

Fix silent data corruption in JIT eltwise kernel for i8/u8 bitwise ops with broadcast#34639
goyaladitya05 wants to merge 2 commits intoopenvinotoolkit:masterfrom
goyaladitya05:fix/jit_eltwise_bitwise_i8_broadcast

goyaladitya05 commented Mar 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

maxnick commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

goyaladitya05 commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cause

Fix

Tests

AI Assistance:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

maxnick commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

goyaladitya05 commented Mar 11, 2026 •

edited

Loading