-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[CPU] Fix u8 Subtract to use wrap-around instead of saturation #33453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[CPU] Fix u8 Subtract to use wrap-around instead of saturation #33453
Conversation
Fixes openvinotoolkit#33164 - Changed ACL executor to use ConvertPolicy::WRAP for u8 subtract - Added u8 support to x64 JIT subtract emitter using vpsubb instruction - Added regression tests for u8 subtract wrap-around behavior
|
Hi maintainers — request for review/CI. This PR fixes u8 Subtract wrap-around semantics in the CPU plugin (Fixes #33164). The issue reporter tested/reviewed and confirmed it solves their problem (they closed the issue after validating). Changes summary:
Review request:
|
|
@EgorDuplensky , could you please review? |
Gate u8 subtract execution to only u8->u8 operations. This ensures wrap-around behavior (e.g., 3 - 4 = 255) for pure u8 arithmetic while preventing u8 execution for dequantization patterns (u8 input, f32/i32 output) where wrap-around would corrupt the math. Changes: - Modified get_supported_precisions() to conditionally enable u8 support only when both inputs AND output are u8 - Added defensive assertion in emit_isa() u8 case - Removed [[maybe_unused]] attribute as node parameter is now used Fixes openvinotoolkit#33164
Gate u8 subtract execution to only pure u8->u8 operations. This ensures wrap-around behavior (e.g., 3 - 4 = 255) for unsigned arithmetic while preventing u8 execution for dequantization patterns (u8 input, f32/i32 output) where wrap-around would corrupt the math. Changes: - JIT: Modified get_supported_precisions() to enable u8 only when both inputs AND output are u8 - ACL: Added same u8->u8 gating for ConvertPolicy::WRAP - Tests: Added TypeRelaxed regression tests to catch LPT/dequant failures Fixes openvinotoolkit#33164
|
I investigated the CI failures and narrowed them down to overly-broad Root causeThe previous change advertised This led to:
Fix
TestsExtended
Kept existing tests that validate wrap-around for pure Key point: wrap-around is only correct when the result type is also @EgorDuplensky Could you please take another look at this updated approach? |
| input_a[i] = static_cast<uint8_t>(i % 10); // 0-9 repeating | ||
| input_b[i] = static_cast<uint8_t>((i % 10) + 1); // 1-10 repeating | ||
| // Expected: proper subtraction with negative results | ||
| expected[i] = static_cast<float>(static_cast<int>(input_a[i]) - static_cast<int>(input_b[i])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be tested against core reference eltwise implementation, not against c++ logic.
We could extend our /src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/classes/eltwise.cpp
to allow to pass some input data configuration, but I think it will be an overkill.
So could you please create
/src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/classes/eltwise_overflow.cpp
and
/src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/instances/common/eltwise_overflow.cpp
And implement the tests similar to /src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/classes/eltwise.cpp and drop all the unnecessary test params and keep the one only essential to the overflow logic.
Also, I think we should cover an 'Add' operation as well in scope of the same test.
The test could for example use enum parameters like UNDERFLOW, OVERFLOW.
| // QDQ patterns with u8 input but f32/i32 output must saturate. | ||
| // See https://github.com/openvinotoolkit/openvino/issues/33164 | ||
|
|
||
| const bool is_u8_u8_to_u8 = (srcDescs[0]->getPrecision() == ov::element::u8) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use 'all_of' utility function
| const auto in1 = node->get_input_element_type(1); | ||
| const auto out = node->get_output_element_type(0); | ||
|
|
||
| if (in0 == element::u8 && in1 == element::u8 && out == element::u8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use 'all_of' utility function
|
|
||
| // Only enable u8 wrap-around for pure u8->u8 arithmetic (issue #33164). | ||
| // QDQ/dequantization patterns (u8 input, f32/i32 output) must NOT use u8 execution. | ||
| if (node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OPENVINO_ASSERT(node, ...) instead of 'if'
| const std::shared_ptr<ov::Node>& node) { | ||
| std::set<std::vector<element::Type>> supported = {{element::f32, element::f32}, {element::i32, element::i32}}; | ||
|
|
||
| // Only enable u8 wrap-around for pure u8->u8 arithmetic (issue #33164). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need to reference any tickets if it is the way it should work.
| // Only use wrap-around for pure u8->u8 subtract (issue #33164). | ||
| // QDQ patterns with u8 input but f32/i32 output must saturate. | ||
| // See https://github.com/openvinotoolkit/openvino/issues/33164 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid referencing the tickets if it is just a normal behavior and not some workaround.
[CPU] Fix u8 Subtract to use wrap-around instead of saturation
Fixes #33164
This fixes the bug where u8 subtraction was saturating to 0 instead of wrapping around like NumPy does.
For example:
uint8(3) - uint8(4)was returning0but should return255(like3 - 4 mod 256).What Changed
I found the bug was happening in two places:
ARM (ACL backend) - The subtract operation was hardcoded to use
ConvertPolicy::SATURATE. I changed it to check the output type and useConvertPolicy::WRAPwhen working with u8.x64 (JIT backend) - The JIT emitter didn't support u8 precision at all for subtraction, so it was falling back to float operations and then saturating when converting back. I added u8 to the supported precisions and implemented it using the
vpsubbinstruction, which automatically does wrap-around.I also added tests to make sure this doesn't break again. The tests cover basic cases like
3 - 4 = 255, larger vectors, and 4D tensors.Files modified
src/plugins/intel_cpu/src/nodes/executors/acl/acl_eltwise.cppsrc/plugins/intel_cpu/src/emitters/plugin/x64/jit_eltwise_emitters.cppsrc/plugins/intel_cpu/tests/.../subtract_u8_wrap.cpp(new test file)Closes #33164