[CPU] Fix u8 Subtract to use wrap-around instead of saturation#33453
[CPU] Fix u8 Subtract to use wrap-around instead of saturation#33453Nishant-ZFYII wants to merge 8 commits intoopenvinotoolkit:masterfrom
Conversation
Fixes openvinotoolkit#33164 - Changed ACL executor to use ConvertPolicy::WRAP for u8 subtract - Added u8 support to x64 JIT subtract emitter using vpsubb instruction - Added regression tests for u8 subtract wrap-around behavior
|
Hi maintainers — request for review/CI. This PR fixes u8 Subtract wrap-around semantics in the CPU plugin (Fixes #33164). The issue reporter tested/reviewed and confirmed it solves their problem (they closed the issue after validating). Changes summary:
Review request:
|
|
@EgorDuplensky , could you please review? |
Gate u8 subtract execution to only u8->u8 operations. This ensures wrap-around behavior (e.g., 3 - 4 = 255) for pure u8 arithmetic while preventing u8 execution for dequantization patterns (u8 input, f32/i32 output) where wrap-around would corrupt the math. Changes: - Modified get_supported_precisions() to conditionally enable u8 support only when both inputs AND output are u8 - Added defensive assertion in emit_isa() u8 case - Removed [[maybe_unused]] attribute as node parameter is now used Fixes openvinotoolkit#33164
Gate u8 subtract execution to only pure u8->u8 operations. This ensures wrap-around behavior (e.g., 3 - 4 = 255) for unsigned arithmetic while preventing u8 execution for dequantization patterns (u8 input, f32/i32 output) where wrap-around would corrupt the math. Changes: - JIT: Modified get_supported_precisions() to enable u8 only when both inputs AND output are u8 - ACL: Added same u8->u8 gating for ConvertPolicy::WRAP - Tests: Added TypeRelaxed regression tests to catch LPT/dequant failures Fixes openvinotoolkit#33164
|
I investigated the CI failures and narrowed them down to overly-broad Root causeThe previous change advertised This led to:
Fix
TestsExtended
Kept existing tests that validate wrap-around for pure Key point: wrap-around is only correct when the result type is also @EgorDuplensky Could you please take another look at this updated approach? |
...s/intel_cpu/tests/functional/custom/single_layer_tests/instances/common/subtract_u8_wrap.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/emitters/plugin/x64/jit_eltwise_emitters.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/emitters/plugin/x64/jit_eltwise_emitters.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/emitters/plugin/x64/jit_eltwise_emitters.cpp
Outdated
Show resolved
Hide resolved
|
Thanks for the review. Will make the changes . Thanks! |
- Replace subtract_u8_wrap.cpp with proper eltwise_overflow test class - Test both UNDERFLOW (subtract) and OVERFLOW (add) using CompareWithRefs - Use all_of() utility instead of chained && comparisons - Use OPENVINO_ASSERT instead of if-check for node null - Remove issue ticket references from comments
|
@EgorDuplensky Thanks for the review — I’ve pushed an update that addresses all the notes:
Let me know if you’d prefer different shapes or a narrower/wider test scope. |
|
@EgorDuplensky . Kindly requesting you to look into the recent updates. Thanks and regards. |
src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/classes/eltwise_overflow.cpp
Outdated
Show resolved
Hide resolved
… comments - Added u8 wrap-around support for jit_add_emitter (x64 JIT) - Added ConvertPolicy::WRAP for EltwiseAdd in ACL executor (ARM) - Changed test inputs to hardcoded values that guarantee overflow/underflow - Fixed test to use ov::Model directly instead of makeNgraphFunction
|
Hi @EgorDuplensky — thanks again for the detailed review and for your patience and guidance while I worked through this. I’ve pushed an update that addresses the latest comments:
Whenever you have a moment, could you please take another look? Thanks! Willing to make more corrections if required. |
EgorDuplensky
left a comment
There was a problem hiding this comment.
@Nishant-ZFYII Could you please double check that new tests are failing without the changes.
|
@Nishant-ZFYII Many tests failed, please check the logs. |
|
Hi, @EgorDuplensky , Pushed a fix for the 52 CI failures. Root cause: Fix: Replaced This was my oversight — I should have traced how I also verified locally that the tests catch the bug on unpatched code:
I want to make sure I'm not missing anything — does this |
| // Copyright (C) 2025 Intel Corporation | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| // |
There was a problem hiding this comment.
| // Copyright (C) 2025 Intel Corporation | |
| // SPDX-License-Identifier: Apache-2.0 | |
| // | |
| // Copyright (C) 2018-2026 Intel Corporation | |
| // SPDX-License-Identifier: Apache-2.0 | |
| // |
| // Copyright (C) 2025 Intel Corporation | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| // |
There was a problem hiding this comment.
| // Copyright (C) 2025 Intel Corporation | |
| // SPDX-License-Identifier: Apache-2.0 | |
| // | |
| // Copyright (C) 2018-2026 Intel Corporation | |
| // SPDX-License-Identifier: Apache-2.0 | |
| // |
| // Copyright (C) 2025 Intel Corporation | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| // |
There was a problem hiding this comment.
| // Copyright (C) 2025 Intel Corporation | |
| // SPDX-License-Identifier: Apache-2.0 | |
| // | |
| // Copyright (C) 2018-2026 Intel Corporation | |
| // SPDX-License-Identifier: Apache-2.0 | |
| // |
@v-Golubev Could you please clarify, why there is a 'Node' parameter at all for get_supported_precisions() function? It almost never used, but maybe I have missed something |
|
Good Day @v-Golubev , @EgorDuplensky Is there anything I can do from my end to help with the issue. Thanks and Regards, |
[CPU] Fix u8 Subtract to use wrap-around instead of saturation
Fixes #33164
This fixes the bug where u8 subtraction was saturating to 0 instead of wrapping around like NumPy does.
For example:
uint8(3) - uint8(4)was returning0but should return255(like3 - 4 mod 256).What Changed
I found the bug was happening in two places:
ARM (ACL backend) - The subtract operation was hardcoded to use
ConvertPolicy::SATURATE. I changed it to check the output type and useConvertPolicy::WRAPwhen working with u8.x64 (JIT backend) - The JIT emitter didn't support u8 precision at all for subtraction, so it was falling back to float operations and then saturating when converting back. I added u8 to the supported precisions and implemented it using the
vpsubbinstruction, which automatically does wrap-around.I also added tests to make sure this doesn't break again. The tests cover basic cases like
3 - 4 = 255, larger vectors, and 4D tensors.Files modified
src/plugins/intel_cpu/src/nodes/executors/acl/acl_eltwise.cppsrc/plugins/intel_cpu/src/emitters/plugin/x64/jit_eltwise_emitters.cppsrc/plugins/intel_cpu/tests/.../subtract_u8_wrap.cpp(new test file)Closes #33164