-
Notifications
You must be signed in to change notification settings - Fork 251
Updated RQ/CPU extension signatures #3441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Updated RQ/CPU extension signatures #3441
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no test to catch the bug. Please add it.
E.g. you can mock call of Quantize_backward
and check arguments type via call_args
. This test would fail before this PR and pass with it.
|
The test was added. |
return {grad_input, grad_input_low, grad_input_range}; | ||
} | ||
auto dummy_variable = torch::autograd::make_variable(at::empty(input_low.sizes()), true); | ||
return {grad_input, dummy_variable, grad_input_range}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no gradient for input_low
for symmetric case, but it should return something in C++ and CUDA in order to be compiled.
In Python, I guess you can even try return None
, since it's not used anywhere later. It would save some memory and time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even CPU and CUDA signatures are different.
nncf/torch/extensions/src/quantization/cuda/functions_cuda.cpp#L21-L28 - is_symmetric variable is not presented for CUDA.
nncf/torch/extensions/src/quantization/cpu/functions_cpu.cpp#L91-L99 - is_symmetric variable is presented for CPU.
I doubt that grad_input_low is not presented in CUDA kernels -, but based on the absence of is_symmetric variable, we may assume that the calculation exists here.
Thus, there are a few questions: why are calculations on CUDA not optimised for the symmetric case? Which scenario is correct then, since there are 3 options: calculate the gradient for input_low as common, do not calculate it, calculate based on the flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea, why symmetric case wasn't optimized on CUDA
. Better to ask @vshampor and @alexsu52.
However, this value is not utilized and can be any value that does not cause compilation errors: https://github.com/openvinotoolkit/nncf/blob/develop/nncf/torch/quantization/quantize_functions.py#L65
Current code for the reference doesn't look optimal, because of some unnecessary manipulations with unused gradient. But you can double-check the performance by profiling. I'd return None or dummy tensor as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you wish.
assert torch.allclose(ext_fwd_output, ref_fwd_output) | ||
|
||
bwd_grad_input, bwd_grad_low, bwd_grad_range = QuantizedFunctionsCPU.get("Quantize_backward")(*bwd_args) | ||
ref_grad_input, ref_grad_low, ref_grad_range = ReferenceQuantizedFunctions.Quantize_backward(*bwd_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can ignore gradient for input low in the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the differences between extensions and references wouldn't be noticed.
You've previously worried about the absence of tests for years for this case, and now suggest testing it partially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, gradient for input low is not used anywhere as intended. IMHO, nothing wrong here and it's not an error.
It's coming from a requirement of torch.autograd
to have the same number of outputs on backward as the number of inputs on forward. But symmetric case has only one learnable parameter - scale, which is represented by input_range
, that's why only one gradient is used.
My ask was about testing the functionality of extensions with reference implementation.
The gradient for input low doesn't influence this functionality. Hope, it makes things clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, currently, the method signatures are different between CUDA and CPU extensions.
Furthermore, the grad_range is not the same between CUDA and the reference execution (see failed tests on CUDA). We can continue to sweep the issues under the rug and only check grad_input, if you want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please stop misinterpreting my words.
I didn't say that we should check only grad_input
and say nothing about different signatures.
My point was only about gradient for input_low
.
Feel free to align signatures, fix and check gradient for input_range
- no objection for that.
Changes
ReferenceQuantize.backward
signature to align with the CUDA/CPU extensions signature.Reason for changes
Related tickets
Tests
tests/torch/quantization/test_functions.py