Skip to content

Updated RQ/CPU extension signatures #3441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

nikita-malininn
Copy link
Collaborator

@nikita-malininn nikita-malininn commented Apr 17, 2025

Changes

  • Updated ReferenceQuantize.backward signature to align with the CUDA/CPU extensions signature.
  • Updated CPU extension cpp part to align results in different cases (sym/asym).

Reason for changes

  • Bugfix

Related tickets

  • 166245

Tests

  • Added tests/torch/quantization/test_functions.py

@nikita-malininn nikita-malininn requested a review from a team as a code owner April 17, 2025 10:10
@github-actions github-actions bot added the NNCF PT Pull requests that updates NNCF PyTorch label Apr 17, 2025
Copy link
Contributor

@ljaljushkin ljaljushkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no test to catch the bug. Please add it.
E.g. you can mock call of Quantize_backward and check arguments type via call_args. This test would fail before this PR and pass with it.

@ljaljushkin
Copy link
Contributor

ljaljushkin commented Apr 17, 2025

There's no test to catch the bug. Please add it. E.g. you can mock call of Quantize_backward and check arguments type via call_args. This test would fail before this PR and pass with it.

call_args probably won't help with the test, but still there's no test checking that extension initialized by reference code is functional.
Before your PR the extension wasn't functional in that case and nobody has noticed that for several years until now, because there was no test.

@nikita-malininn nikita-malininn changed the title Updated RQ signature Updated RQ/CPU extension signatures Apr 17, 2025
@nikita-malininn
Copy link
Collaborator Author

There's no test to catch the bug. Please add it. E.g. you can mock call of Quantize_backward and check arguments type via call_args. This test would fail before this PR and pass with it.

call_args probably won't help with the test, but still there's no test checking that extension initialized by reference code is functional. Before your PR the extension wasn't functional in that case and nobody has noticed that for several years until now, because there was no test.

The test was added.

return {grad_input, grad_input_low, grad_input_range};
}
auto dummy_variable = torch::autograd::make_variable(at::empty(input_low.sizes()), true);
return {grad_input, dummy_variable, grad_input_range};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no gradient for input_low for symmetric case, but it should return something in C++ and CUDA in order to be compiled.
In Python, I guess you can even try return None, since it's not used anywhere later. It would save some memory and time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even CPU and CUDA signatures are different.
nncf/torch/extensions/src/quantization/cuda/functions_cuda.cpp#L21-L28 - is_symmetric variable is not presented for CUDA.
nncf/torch/extensions/src/quantization/cpu/functions_cpu.cpp#L91-L99 - is_symmetric variable is presented for CPU.
I doubt that grad_input_low is not presented in CUDA kernels -, but based on the absence of is_symmetric variable, we may assume that the calculation exists here.

Thus, there are a few questions: why are calculations on CUDA not optimised for the symmetric case? Which scenario is correct then, since there are 3 options: calculate the gradient for input_low as common, do not calculate it, calculate based on the flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, why symmetric case wasn't optimized on CUDA. Better to ask @vshampor and @alexsu52.
However, this value is not utilized and can be any value that does not cause compilation errors: https://github.com/openvinotoolkit/nncf/blob/develop/nncf/torch/quantization/quantize_functions.py#L65

Current code for the reference doesn't look optimal, because of some unnecessary manipulations with unused gradient. But you can double-check the performance by profiling. I'd return None or dummy tensor as before.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you wish.

assert torch.allclose(ext_fwd_output, ref_fwd_output)

bwd_grad_input, bwd_grad_low, bwd_grad_range = QuantizedFunctionsCPU.get("Quantize_backward")(*bwd_args)
ref_grad_input, ref_grad_low, ref_grad_range = ReferenceQuantizedFunctions.Quantize_backward(*bwd_args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can ignore gradient for input low in the test

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the differences between extensions and references wouldn't be noticed.

You've previously worried about the absence of tests for years for this case, and now suggest testing it partially.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, gradient for input low is not used anywhere as intended. IMHO, nothing wrong here and it's not an error.
It's coming from a requirement of torch.autograd to have the same number of outputs on backward as the number of inputs on forward. But symmetric case has only one learnable parameter - scale, which is represented by input_range, that's why only one gradient is used.

My ask was about testing the functionality of extensions with reference implementation.
The gradient for input low doesn't influence this functionality. Hope, it makes things clear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you wish.

Copy link
Collaborator Author

@nikita-malininn nikita-malininn Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, currently, the method signatures are different between CUDA and CPU extensions.

Furthermore, the grad_range is not the same between CUDA and the reference execution (see failed tests on CUDA). We can continue to sweep the issues under the rug and only check grad_input, if you want?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please stop misinterpreting my words.
I didn't say that we should check only grad_input and say nothing about different signatures.

My point was only about gradient for input_low.
Feel free to align signatures, fix and check gradient for input_range - no objection for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NNCF PT Pull requests that updates NNCF PyTorch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants