[LPT] FQStripping transformation rework#33989
Merged
v-Golubev merged 77 commits intoopenvinotoolkit:masterfrom Mar 4, 2026
Merged
[LPT] FQStripping transformation rework#33989v-Golubev merged 77 commits intoopenvinotoolkit:masterfrom
v-Golubev merged 77 commits intoopenvinotoolkit:masterfrom
Conversation
f2e03fb to
6077ca5
Compare
aobolensk
reviewed
Feb 17, 2026
src/plugins/intel_gpu/tests/functional/subgraph_tests/dynamic/qdq_stripping.cpp
Show resolved
Hide resolved
src/plugins/intel_gpu/tests/functional/subgraph_tests/dynamic/qdq_stripping.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/tests/functional/subgraph_tests/dynamic/qdq_stripping.cpp
Show resolved
Hide resolved
aobolensk
reviewed
Feb 17, 2026
src/common/transformations/include/transformations/utils/utils.hpp
Outdated
Show resolved
Hide resolved
aobolensk
reviewed
Feb 18, 2026
src/plugins/intel_gpu/tests/functional/subgraph_tests/dynamic/qdq_stripping.cpp
Outdated
Show resolved
Hide resolved
Contributor
Author
|
@isanghao could you please take a look at GPU part? Thanks in advance |
Merged
via the queue into
openvinotoolkit:master
with commit Mar 4, 2026
f8efd35
225 of 229 checks passed
mlukasze
pushed a commit
to mlukasze/openvino
that referenced
this pull request
Mar 5, 2026
### Details: Some INT16 models rely on U16/I16 FakeQuantize layers. Simply stripping these FakeQuantize operations may be insufficient when such models are executed in f16 precision, because the original (unquantized) activation values flowing through the stripped path may exceed the representable f16 range. This can lead to overflow and, consequently, incorrect inference results. This PR introduces a new mechanism called `ScaleAdjuster`. The `ScaleAdjuster` detects activation paths that feed into scale‑invariant nodes and safely reduces the magnitude of activation values to keep them within the f16 numeric range — without altering the model’s semantic correctness (so the adjustment is possible only for activations paths which reach scale-invariant nodes). The implementation is validated by: - GPU functional tests, ensuring inference correctness, and - LPT graph comparison tests, verifying structural consistency of transformations. ### Tickets: - *CVS-180573*
Nishant-ZFYII
pushed a commit
to Nishant-ZFYII/openvino
that referenced
this pull request
Mar 5, 2026
### Details: Some INT16 models rely on U16/I16 FakeQuantize layers. Simply stripping these FakeQuantize operations may be insufficient when such models are executed in f16 precision, because the original (unquantized) activation values flowing through the stripped path may exceed the representable f16 range. This can lead to overflow and, consequently, incorrect inference results. This PR introduces a new mechanism called `ScaleAdjuster`. The `ScaleAdjuster` detects activation paths that feed into scale‑invariant nodes and safely reduces the magnitude of activation values to keep them within the f16 numeric range — without altering the model’s semantic correctness (so the adjustment is possible only for activations paths which reach scale-invariant nodes). The implementation is validated by: - GPU functional tests, ensuring inference correctness, and - LPT graph comparison tests, verifying structural consistency of transformations. ### Tickets: - *CVS-180573*
atamas19
pushed a commit
to atamas19/openvino
that referenced
this pull request
Mar 6, 2026
### Details: Some INT16 models rely on U16/I16 FakeQuantize layers. Simply stripping these FakeQuantize operations may be insufficient when such models are executed in f16 precision, because the original (unquantized) activation values flowing through the stripped path may exceed the representable f16 range. This can lead to overflow and, consequently, incorrect inference results. This PR introduces a new mechanism called `ScaleAdjuster`. The `ScaleAdjuster` detects activation paths that feed into scale‑invariant nodes and safely reduces the magnitude of activation values to keep them within the f16 numeric range — without altering the model’s semantic correctness (so the adjustment is possible only for activations paths which reach scale-invariant nodes). The implementation is validated by: - GPU functional tests, ensuring inference correctness, and - LPT graph comparison tests, verifying structural consistency of transformations. ### Tickets: - *CVS-180573*
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Details:
Some INT16 models rely on U16/I16 FakeQuantize layers. Simply stripping these FakeQuantize operations may be insufficient when such models are executed in f16 precision, because the original (unquantized) activation values flowing through the stripped path may exceed the representable f16 range. This can lead to overflow and, consequently, incorrect inference results.
This PR introduces a new mechanism called
ScaleAdjuster.The
ScaleAdjusterdetects activation paths that feed into scale‑invariant nodes and safely reduces the magnitude of activation values to keep them within the f16 numeric range — without altering the model’s semantic correctness (so the adjustment is possible only for activations paths which reach scale-invariant nodes).The implementation is validated by:
Tickets: