Skip to content

[LPT] FQStripping transformation rework#33989

Merged
v-Golubev merged 77 commits intoopenvinotoolkit:masterfrom
v-Golubev:vg/lpt/qdq_stripping_rework
Mar 4, 2026
Merged

[LPT] FQStripping transformation rework#33989
v-Golubev merged 77 commits intoopenvinotoolkit:masterfrom
v-Golubev:vg/lpt/qdq_stripping_rework

Conversation

@v-Golubev
Copy link
Contributor

@v-Golubev v-Golubev commented Feb 5, 2026

Details:

Some INT16 models rely on U16/I16 FakeQuantize layers. Simply stripping these FakeQuantize operations may be insufficient when such models are executed in f16 precision, because the original (unquantized) activation values flowing through the stripped path may exceed the representable f16 range. This can lead to overflow and, consequently, incorrect inference results.

This PR introduces a new mechanism called ScaleAdjuster.
The ScaleAdjuster detects activation paths that feed into scale‑invariant nodes and safely reduces the magnitude of activation values to keep them within the f16 numeric range — without altering the model’s semantic correctness (so the adjustment is possible only for activations paths which reach scale-invariant nodes).

The implementation is validated by:

  • GPU functional tests, ensuring inference correctness, and
  • LPT graph comparison tests, verifying structural consistency of transformations.

Tickets:

@github-actions github-actions bot added category: GPU OpenVINO GPU plugin category: CPU OpenVINO CPU plugin category: LP transformations OpenVINO Low Precision transformations labels Feb 5, 2026
@v-Golubev v-Golubev added the WIP work in progress label Feb 10, 2026
@v-Golubev v-Golubev marked this pull request as ready for review February 10, 2026 15:31
@v-Golubev v-Golubev requested review from a team as code owners February 10, 2026 15:31
@v-Golubev v-Golubev requested a review from a team as a code owner February 12, 2026 12:49
@github-actions github-actions bot added the category: transformations OpenVINO Runtime library - Transformations label Feb 12, 2026
@v-Golubev v-Golubev requested review from a team as code owners February 12, 2026 13:58
@github-actions github-actions bot added the category: IE Tests OpenVINO Test: plugins and common label Feb 12, 2026
@v-Golubev v-Golubev force-pushed the vg/lpt/qdq_stripping_rework branch from f2e03fb to 6077ca5 Compare February 12, 2026 16:41
@v-Golubev v-Golubev removed the WIP work in progress label Feb 17, 2026
@v-Golubev v-Golubev requested a review from aobolensk February 18, 2026 11:20
Copy link
Contributor

@aobolensk aobolensk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@v-Golubev
Copy link
Contributor Author

@isanghao could you please take a look at GPU part? Thanks in advance

Copy link
Contributor

@isanghao isanghao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for GPU part

@v-Golubev v-Golubev added this pull request to the merge queue Mar 4, 2026
Merged via the queue into openvinotoolkit:master with commit f8efd35 Mar 4, 2026
225 of 229 checks passed
@v-Golubev v-Golubev deleted the vg/lpt/qdq_stripping_rework branch March 4, 2026 12:30
mlukasze pushed a commit to mlukasze/openvino that referenced this pull request Mar 5, 2026
### Details:
Some INT16 models rely on U16/I16 FakeQuantize layers. Simply stripping
these FakeQuantize operations may be insufficient when such models are
executed in f16 precision, because the original (unquantized) activation
values flowing through the stripped path may exceed the representable
f16 range. This can lead to overflow and, consequently, incorrect
inference results.

This PR introduces a new mechanism called `ScaleAdjuster`.
The `ScaleAdjuster` detects activation paths that feed into
scale‑invariant nodes and safely reduces the magnitude of activation
values to keep them within the f16 numeric range — without altering the
model’s semantic correctness (so the adjustment is possible only for
activations paths which reach scale-invariant nodes).

The implementation is validated by:
 - GPU functional tests, ensuring inference correctness, and
- LPT graph comparison tests, verifying structural consistency of
transformations.

### Tickets:
 - *CVS-180573*
Nishant-ZFYII pushed a commit to Nishant-ZFYII/openvino that referenced this pull request Mar 5, 2026
### Details:
Some INT16 models rely on U16/I16 FakeQuantize layers. Simply stripping
these FakeQuantize operations may be insufficient when such models are
executed in f16 precision, because the original (unquantized) activation
values flowing through the stripped path may exceed the representable
f16 range. This can lead to overflow and, consequently, incorrect
inference results.

This PR introduces a new mechanism called `ScaleAdjuster`.
The `ScaleAdjuster` detects activation paths that feed into
scale‑invariant nodes and safely reduces the magnitude of activation
values to keep them within the f16 numeric range — without altering the
model’s semantic correctness (so the adjustment is possible only for
activations paths which reach scale-invariant nodes).

The implementation is validated by:
 - GPU functional tests, ensuring inference correctness, and
- LPT graph comparison tests, verifying structural consistency of
transformations.

### Tickets:
 - *CVS-180573*
atamas19 pushed a commit to atamas19/openvino that referenced this pull request Mar 6, 2026
### Details:
Some INT16 models rely on U16/I16 FakeQuantize layers. Simply stripping
these FakeQuantize operations may be insufficient when such models are
executed in f16 precision, because the original (unquantized) activation
values flowing through the stripped path may exceed the representable
f16 range. This can lead to overflow and, consequently, incorrect
inference results.

This PR introduces a new mechanism called `ScaleAdjuster`.
The `ScaleAdjuster` detects activation paths that feed into
scale‑invariant nodes and safely reduces the magnitude of activation
values to keep them within the f16 numeric range — without altering the
model’s semantic correctness (so the adjustment is possible only for
activations paths which reach scale-invariant nodes).

The implementation is validated by:
 - GPU functional tests, ensuring inference correctness, and
- LPT graph comparison tests, verifying structural consistency of
transformations.

### Tickets:
 - *CVS-180573*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin category: IE Tests OpenVINO Test: plugins and common category: LP transformations OpenVINO Low Precision transformations category: transformations OpenVINO Runtime library - Transformations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants