[CPU] Fix u8 Subtract to use wrap-around instead of saturation by Nishant-ZFYII · Pull Request #33453 · openvinotoolkit/openvino

Nishant-ZFYII · 2026-01-03T21:37:40Z

[CPU] Fix u8 Subtract to use wrap-around instead of saturation

This fixes the bug where u8 subtraction was saturating to 0 instead of wrapping around like NumPy does.

For example: uint8(3) - uint8(4) was returning 0 but should return 255 (like 3 - 4 mod 256).

What Changed

I found the bug was happening in two places:

ARM (ACL backend) - The subtract operation was hardcoded to use ConvertPolicy::SATURATE. I changed it to check the output type and use ConvertPolicy::WRAP when working with u8.

x64 (JIT backend) - The JIT emitter didn't support u8 precision at all for subtraction, so it was falling back to float operations and then saturating when converting back. I added u8 to the supported precisions and implemented it using the vpsubb instruction, which automatically does wrap-around.

I also added tests to make sure this doesn't break again. The tests cover basic cases like 3 - 4 = 255, larger vectors, and 4D tensors.

Files modified

src/plugins/intel_cpu/src/nodes/executors/acl/acl_eltwise.cpp
src/plugins/intel_cpu/src/emitters/plugin/x64/jit_eltwise_emitters.cpp
src/plugins/intel_cpu/tests/.../subtract_u8_wrap.cpp (new test file)

Closes #33164

Fixes openvinotoolkit#33164 - Changed ACL executor to use ConvertPolicy::WRAP for u8 subtract - Added u8 support to x64 JIT subtract emitter using vpsubb instruction - Added regression tests for u8 subtract wrap-around behavior

Nishant-ZFYII · 2026-01-05T04:44:00Z

Hi maintainers — request for review/CI.

This PR fixes u8 Subtract wrap-around semantics in the CPU plugin (Fixes #33164). The issue reporter tested/reviewed and confirmed it solves their problem (they closed the issue after validating).

Changes summary:

ACL executor: use ConvertPolicy::WRAP for u8 subtract (instead of saturate)
x64 JIT subtract emitter: add u8 support via vpsubb (wrap-around behavior)
Added regression tests for u8 subtract wrap-around

Review request:

@openvinotoolkit/openvino-ie-cpu-maintainers (CODEOWNERS for src/plugins/intel_cpu/...)
@zhihaoxu1325 (tagged in the issue thread)

maxnick · 2026-01-05T12:56:48Z

@EgorDuplensky , could you please review?

Gate u8 subtract execution to only u8->u8 operations. This ensures wrap-around behavior (e.g., 3 - 4 = 255) for pure u8 arithmetic while preventing u8 execution for dequantization patterns (u8 input, f32/i32 output) where wrap-around would corrupt the math. Changes: - Modified get_supported_precisions() to conditionally enable u8 support only when both inputs AND output are u8 - Added defensive assertion in emit_isa() u8 case - Removed [[maybe_unused]] attribute as node parameter is now used Fixes openvinotoolkit#33164

Gate u8 subtract execution to only pure u8->u8 operations. This ensures wrap-around behavior (e.g., 3 - 4 = 255) for unsigned arithmetic while preventing u8 execution for dequantization patterns (u8 input, f32/i32 output) where wrap-around would corrupt the math. Changes: - JIT: Modified get_supported_precisions() to enable u8 only when both inputs AND output are u8 - ACL: Added same u8->u8 gating for ConvertPolicy::WRAP - Tests: Added TypeRelaxed regression tests to catch LPT/dequant failures Fixes openvinotoolkit#33164

Nishant-ZFYII · 2026-01-06T20:26:19Z

I investigated the CI failures and narrowed them down to overly-broad u8 enablement in the x64 JIT subtract path.

Root cause

The previous change advertised {u8, u8} unconditionally in jit_subtract_emitter::get_supported_precisions().
That allowed kernel selection to pick the u8 JIT implementation in Q/DQ / dequantization patterns where inputs are u8, but the subtraction is semantically part of dequant and the output is f32/i32.

This led to:

Crash: store_vector / store path doesn’t support emitting a u8 source into a non-u8 destination in that configuration (unsupported src_prc: u8).
Wrong results: wrap-around arithmetic was applied where signed/expanded arithmetic is required (e.g., 100u8 - 128 should become -28, not 228).

Fix

x64 JIT (jit_eltwise_emitters.cpp): get_supported_precisions() now advertises {u8, u8} only when both inputs and the output are u8. This keeps wrap-around behavior for the intended u8 → u8 case (issue [Bug]: uint8 Subtraction operator exhibits Saturation behavior (0) instead of Wrap-around (255) unlike NumPy #33164), while ensuring Q/DQ dequant cases fall back to the correct non-u8 implementation.
ACL (acl_eltwise.cpp): wrap policy remains gated to u8 output (i.e., wrap only for u8 → u8 semantics; otherwise keep the existing policy).

Tests

Extended subtract_u8_wrap.cpp with additional coverage for the failure mode:

u8 inputs with overridden f32/i32 outputs (TypeRelaxed) → verifies no wrap-around and prevents regression of the crash/wrong-results behavior.

Kept existing tests that validate wrap-around for pure u8 - u8 → u8.

Key point: wrap-around is only correct when the result type is also u8; for u8 - u8 → f32/i32 (typical dequant), modular arithmetic is incorrect.

@EgorDuplensky Could you please take another look at this updated approach?

...s/intel_cpu/tests/functional/custom/single_layer_tests/instances/common/subtract_u8_wrap.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_eltwise.cpp

src/plugins/intel_cpu/src/emitters/plugin/x64/jit_eltwise_emitters.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_eltwise.cpp

Nishant-ZFYII · 2026-01-12T03:19:41Z

Thanks for the review. Will make the changes .

Thanks!

- Replace subtract_u8_wrap.cpp with proper eltwise_overflow test class - Test both UNDERFLOW (subtract) and OVERFLOW (add) using CompareWithRefs - Use all_of() utility instead of chained && comparisons - Use OPENVINO_ASSERT instead of if-check for node null - Remove issue ticket references from comments

Nishant-ZFYII · 2026-01-17T12:10:57Z

@EgorDuplensky Thanks for the review — I’ve pushed an update that addresses all the notes:

Removed issue/ticket links from the code comments and kept only the behavior description.
Switched the u8 type checks to ov::intel_cpu::all_of(...) in both the ACL executor and the JIT gating.
Replaced the if (node) guard with OPENVINO_ASSERT(node, ...) in jit_subtract_emitter::get_supported_precisions().
Reworked the regression coverage to use the existing CompareWithRefs flow (CPU plugin vs reference): removed subtract_u8_wrap.cpp and added eltwise_overflow tests parameterized by UNDERFLOW/OVERFLOW, covering Subtract underflow and Add overflow for u8.

Let me know if you’d prefer different shapes or a narrower/wider test scope.

Nishant-ZFYII · 2026-01-24T01:34:54Z

@EgorDuplensky . Kindly requesting you to look into the recent updates.

Thanks and regards.

src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/classes/eltwise_overflow.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_eltwise.cpp

… comments - Added u8 wrap-around support for jit_add_emitter (x64 JIT) - Added ConvertPolicy::WRAP for EltwiseAdd in ACL executor (ARM) - Changed test inputs to hardcoded values that guarantee overflow/underflow - Fixed test to use ov::Model directly instead of makeNgraphFunction

Nishant-ZFYII · 2026-02-05T00:17:35Z

Hi @EgorDuplensky — thanks again for the detailed review and for your patience and guidance while I worked through this.

I’ve pushed an update that addresses the latest comments:

Updated the regression test to use hardcoded u8 inputs (shape-independent) to deterministically trigger underflow/overflow.
Fixed u8 Add overflow handling by using wrap-around for pure u8→u8 in both ACL and the x64 JIT path, while keeping QDQ/mixed-precision cases on the widened/saturating path.

Whenever you have a moment, could you please take another look? Thanks!

Willing to make more corrections if required.

EgorDuplensky

@Nishant-ZFYII Could you please double check that new tests are failing without the changes.

EgorDuplensky · 2026-02-17T12:15:28Z

@Nishant-ZFYII Many tests failed, please check the logs.

Nishant-ZFYII · 2026-02-18T01:52:47Z

Hi, @EgorDuplensky ,

Pushed a fix for the 52 CI failures.

Root cause: get_supported_precisions() is called without a node argument by the SupportedPrecisions functor (in jit_uni_eltwise_generic.cpp), which means node defaults to nullptr. The OPENVINO_ASSERT(node, ...) I added was treating that as an error, but it's actually a valid code path — it's a general query for the base set of supported precisions, not tied to any specific node.

Fix: Replaced OPENVINO_ASSERT(node, ...) with if (node && ov::intel_cpu::all_of(...)) in both jit_add_emitter::get_supported_precisions and jit_subtract_emitter::get_supported_precisions. When node is nullptr, the method now returns the default precision set ({f32, f32}, {i32, i32}). When a concrete node is available and all its inputs/outputs are u8, it additionally includes {u8, u8}.

This was my oversight — I should have traced how get_supported_precisions is invoked across the codebase before adding the assert.

I also verified locally that the tests catch the bug on unpatched code:

With fix applied: all 6 smoke_EltwiseOverflowU8 tests pass
Without fix (reverted jit_eltwise_emitters.cpp and acl_eltwise.cpp to master while keeping the test files): all 6 tests fail with Expected: 255 Actual: 0 for underflow and Expected: 0 Actual: 255 for overflow

I want to make sure I'm not missing anything — does this if (node && ov::intel_cpu::all_of(...)) approach look correct to you, or would you prefer a different pattern here?

praasz · 2026-02-23T07:41:36Z

src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/classes/eltwise_overflow.cpp

+// Copyright (C) 2025 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//


Suggested change

// Copyright (C) 2025 Intel Corporation

// SPDX-License-Identifier: Apache-2.0

//

// Copyright (C) 2018-2026 Intel Corporation

// SPDX-License-Identifier: Apache-2.0

//

praasz · 2026-02-23T07:41:48Z

src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/classes/eltwise_overflow.hpp

+// Copyright (C) 2025 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//


Suggested change

// Copyright (C) 2025 Intel Corporation

// SPDX-License-Identifier: Apache-2.0

//

// Copyright (C) 2018-2026 Intel Corporation

// SPDX-License-Identifier: Apache-2.0

//

praasz · 2026-02-23T07:41:58Z

...s/intel_cpu/tests/functional/custom/single_layer_tests/instances/common/eltwise_overflow.cpp

+// Copyright (C) 2025 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//


Suggested change

// Copyright (C) 2025 Intel Corporation

// SPDX-License-Identifier: Apache-2.0

//

// Copyright (C) 2018-2026 Intel Corporation

// SPDX-License-Identifier: Apache-2.0

//

EgorDuplensky · 2026-02-24T13:01:58Z

Hi, @EgorDuplensky ,

Pushed a fix for the 52 CI failures.

Root cause: get_supported_precisions() is called without a node argument by the SupportedPrecisions functor (in jit_uni_eltwise_generic.cpp), which means node defaults to nullptr. The OPENVINO_ASSERT(node, ...) I added was treating that as an error, but it's actually a valid code path — it's a general query for the base set of supported precisions, not tied to any specific node.

Fix: Replaced OPENVINO_ASSERT(node, ...) with if (node && ov::intel_cpu::all_of(...)) in both jit_add_emitter::get_supported_precisions and jit_subtract_emitter::get_supported_precisions. When node is nullptr, the method now returns the default precision set ({f32, f32}, {i32, i32}). When a concrete node is available and all its inputs/outputs are u8, it additionally includes {u8, u8}.

This was my oversight — I should have traced how get_supported_precisions is invoked across the codebase before adding the assert.

I also verified locally that the tests catch the bug on unpatched code:

With fix applied: all 6 smoke_EltwiseOverflowU8 tests pass

Without fix (reverted jit_eltwise_emitters.cpp and acl_eltwise.cpp to master while keeping the test files): all 6 tests fail with Expected: 255 Actual: 0 for underflow and Expected: 0 Actual: 255 for overflow

I want to make sure I'm not missing anything — does this if (node && ov::intel_cpu::all_of(...)) approach look correct to you, or would you prefer a different pattern here?

@v-Golubev Could you please clarify, why there is a 'Node' parameter at all for get_supported_precisions() function? It almost never used, but maybe I have missed something

Nishant-ZFYII · 2026-03-04T16:27:51Z

Good Day @v-Golubev , @EgorDuplensky

Is there anything I can do from my end to help with the issue.

Thanks and Regards,
Nishant.

Nishant-ZFYII requested review from a team as code owners January 3, 2026 21:37

github-actions bot added the category: CPU OpenVINO CPU plugin label Jan 3, 2026

sys-openvino-ci added the ExternalPR External contributor label Jan 3, 2026

Nishant-ZFYII mentioned this pull request Jan 5, 2026

[Bug]: uint8 Subtraction operator exhibits Saturation behavior (0) instead of Wrap-around (255) unlike NumPy #33164

Closed

3 tasks

maxnick assigned maxnick and EgorDuplensky and unassigned maxnick Jan 5, 2026

Nishant-ZFYII added 2 commits January 6, 2026 22:24

EgorDuplensky reviewed Jan 7, 2026

View reviewed changes

Nishant-ZFYII requested a review from EgorDuplensky January 25, 2026 12:08

EgorDuplensky reviewed Jan 28, 2026

View reviewed changes

src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/classes/eltwise_overflow.cpp Outdated Show resolved Hide resolved

src/plugins/intel_cpu/src/nodes/executors/acl/acl_eltwise.cpp Outdated Show resolved Hide resolved

Nishant-ZFYII requested a review from EgorDuplensky February 5, 2026 00:12

EgorDuplensky approved these changes Feb 16, 2026

View reviewed changes

This was linked to issues Feb 17, 2026

[Bug]: Arithmetic Mismatch: int8 Subtract operator uses Saturation instead of Wraparound behavior (Max Diff = 255) #33359

Open

[Bug]: Accuracy mismatch in uint8 arithmetic (Sub/Mul/Add) on CPU: OpenVINO saturates while PyTorch wraps #33518

Open

maxnick removed a link to an issue Feb 17, 2026

[Bug]: Arithmetic Mismatch: int8 Subtract operator uses Saturation instead of Wraparound behavior (Max Diff = 255) #33359

Open

3 tasks

Fix get_supported_precisions crash: use if-guard with OPENVINO_ASSERT

1badae9

Clang-18 fix

6f33255

Nishant-ZFYII requested a review from EgorDuplensky February 18, 2026 01:53

praasz reviewed Feb 23, 2026

View reviewed changes

praasz added this to the 2026.1 milestone Feb 23, 2026

Conversation

Nishant-ZFYII commented Jan 3, 2026

What Changed

Files modified

Uh oh!

Nishant-ZFYII commented Jan 5, 2026

Uh oh!

maxnick commented Jan 5, 2026

Uh oh!

Nishant-ZFYII commented Jan 6, 2026

Root cause

Fix

Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Nishant-ZFYII commented Jan 12, 2026

Uh oh!

Nishant-ZFYII commented Jan 17, 2026

Uh oh!

Nishant-ZFYII commented Jan 24, 2026

Uh oh!

Uh oh!

Uh oh!

Nishant-ZFYII commented Feb 5, 2026

Uh oh!

EgorDuplensky left a comment

Choose a reason for hiding this comment

Uh oh!

EgorDuplensky commented Feb 17, 2026

Uh oh!

Nishant-ZFYII commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

praasz Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

praasz Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

praasz Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

EgorDuplensky commented Feb 24, 2026

Uh oh!

Nishant-ZFYII commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Nishant-ZFYII commented Feb 18, 2026 •

edited

Loading