-
Notifications
You must be signed in to change notification settings - Fork 251
Aanuf/data free awq #3315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Aanuf/data free awq #3315
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test coverage is poor for the new algorithm, that is going to be a default option.
Definitely, a functionality testing for data-free method is needed, that is capable to find an expected alpha (like in template test) and accuracy is improved (conformance test).
As far as I understood, this PR introduces a new behavior for weight compression with dataset and awq
option. It would be great to check typical user scenarios and check whether the expected path is launched (e.g. via mocker.spy
on specific function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyanufr, please fill the PR description
@@ -542,7 +543,7 @@ def apply( | |||
nodes_to_compress = self.get_nodes_to_compress(graph) | |||
|
|||
statistics = None | |||
if self._data_aware_mixed_precision or self._data_aware_compression: | |||
if (self._data_aware_mixed_precision or self._data_aware_compression) and dataset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please redefine self._data_aware_compression
as following:
self._data_aware_compression = (self._awq and self._advanced_parameters.awq_params.is_data_aware) or self._scale_estimation or self._lora_correction or self._gptq
Then we can rollback this if statement to the original form.
if (self._data_aware_mixed_precision or self._data_aware_compression) and dataset: | |
if self._data_aware_mixed_precision or self._data_aware_compression: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to change my original suggestion here 🙂 After the recent changes (is_data_aware
-> prefer_data_aware
) I think it actually makes more sense to define
self._data_aware_compression = self._scale_estimation or self._lora_correction or self._gptq
Because otherwise it can happen that self._data_aware_compression
is True, but data-aware won't actually be applied. This is in case self._awq
is True and dataset is not provided.
And then we can do:
if (self._data_aware_mixed_precision or self._data_aware_compression) and dataset: | |
data_aware_awq = dataset and self._awq and self._advanced_parameters.awq_params.prefer_data_aware | |
if self._data_aware_mixed_precision or self._data_aware_compression or data_aware_awq: |
:param prefer_data_aware: Determines whether to use activations to calculate scales if activations are presented. | ||
:type prefer_data_aware: bool | ||
""" | ||
|
||
subset_size: int = 32 | ||
percent_to_apply: float = 0.002 | ||
alpha_min: float = 0.0 | ||
alpha_max: float = 1.0 | ||
steps: int = 100 | ||
prefer_data_aware: bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:param prefer_data_aware: Determines whether to use activations to calculate scales if activations are presented. | |
:type prefer_data_aware: bool | |
""" | |
subset_size: int = 32 | |
percent_to_apply: float = 0.002 | |
alpha_min: float = 0.0 | |
alpha_max: float = 1.0 | |
steps: int = 100 | |
prefer_data_aware: bool = True | |
:param use_data_aware_scaling: Whether to use activation data for scale calculation when available. | |
:type use_data_aware_scaling: bool | |
""" | |
subset_size: int = 32 | |
percent_to_apply: float = 0.002 | |
alpha_min: float = 0.0 | |
alpha_max: float = 1.0 | |
steps: int = 100 | |
use_data_aware_scaling: bool = True |
What do you think about this?
Changes
Data free AWQ: smooth down_proj input channels and merge extra scale to up_proj output channels.
New method is enabled when AWQ is enabled but the dataset is not available.
Reason for changes