Skip to content

Release QAT example with NLS #3480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from

Conversation

jpablomch
Copy link
Collaborator

@jpablomch jpablomch commented May 6, 2025

Changes

Adds example to use NLS fine-tuning with quantization-aware LoRA on downstream tasks.

Reason for changes

To support fine-tuning for downstream scenarios, and NLS often boost the performance of LoRA fine-tuning on downstream tasks.

Related tickets

https://jira.devtools.intel.com/browse/CVS-166802

Tests

See the results in NLSDownstreamTasks.md. We have conducted extensive evaluation on 11 language models and 4 downstream tasks.

examples job: https://github.com/openvinotoolkit/nncf/actions/runs/14934370942

Signed-off-by: J. Pablo Muñoz <[email protected]>

Co-authored-by: Yuan0320 <[email protected]>
@jpablomch jpablomch requested a review from ljaljushkin May 6, 2025 17:28
@jpablomch jpablomch requested a review from a team as a code owner May 6, 2025 17:28
@github-actions github-actions bot added documentation Improvements or additions to documentation NNCF PT Pull requests that updates NNCF PyTorch labels May 6, 2025
Signed-off-by: J. Pablo Muñoz <[email protected]>

Co-authored-by: Yuan0320 <[email protected]>
Copy link
Contributor

@ljaljushkin ljaljushkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution and very extensive evaluation!
It's great to see an improvement on top of baseline with constant LoRA rank!

On a high level, it looks good for me. Most of the logic is implemented in the sample, changes in NNCF are minimized by extending FQ with LoRA.
I have a few remarks to make it better in terms of integration into NNCF.

One thing that is important for potential customers - total time to get the best checkpoint.
Could you please specify in the readme, how long was tuning and search stage in both cases?

| Qwen/Qwen2.5-7B-Instruct | BF16 | 0.6401 |
| | INT4 (QAT + LoRA) | 0.7356 |
| | INT4 (QAT + NLS) | **0.7382** |

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Average score for bf16 model is smaller for every model? Usually INT4 model has similar or lower accuracy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ljaljushkin @andreyanufr BF16 is the reference result for the uncompressed model without tuning. We have removed these results to avoid confusion, and as discussed in our meeting, in a future PR, we will try BF16 and BF + LoRA to have a better comparison. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a follow-up ticket-166802 for that

Copy link
Contributor

@alexsu52 alexsu52 May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a result, numbers for fine-tuned BF16 models will be added, am I right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a result, numbers for fine-tuned BF16 models will be added, am I right?

We discussed this with Pablo and agreed to add numbers to this PR for the fine-tuned bf16 baseline and also for BF16 + NLS w/o quantization + PTWC (AWQ+SE+GPRQ)

Does the nls example support for fine tuning model without quantization?

as far as I know, it doesn't support.
For that, we need to add a new operation using lora adapters and nncf.com_weights does not suit for that. Pablo will probably do it using peft, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ljaljushkin @alexsu52 Yes, we are planning to use PEFT to get the reference BF16 numbers. Thanks!

Signed-off-by: J. Pablo Muñoz <[email protected]>

Co-authored-by: Yuan0320 <[email protected]>
@github-actions github-actions bot added NNCF Common Pull request that updates NNCF Common NNCF PTQ Pull requests that updates NNCF PTQ API Public API-impacting changes labels May 8, 2025
jpablomch and others added 3 commits May 8, 2025 10:41
Signed-off-by: J. Pablo Muñoz <[email protected]>
Signed-off-by: J. Pablo Muñoz <[email protected]>
Signed-off-by: J. Pablo Muñoz <[email protected]>

Co-authored-by: Yuan0320 <[email protected]>
| Qwen/Qwen2.5-7B-Instruct | BF16 | 0.6401 |
| | INT4 (QAT + LoRA) | 0.7356 |
| | INT4 (QAT + NLS) | **0.7382** |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a follow-up ticket-166802 for that

jpablomch and others added 2 commits May 9, 2025 10:13
Signed-off-by: J. Pablo Muñoz <[email protected]>

Co-authored-by: Yuan0320 <[email protected]>
Signed-off-by: J. Pablo Muñoz <[email protected]>
Copy link
Contributor

@ljaljushkin ljaljushkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor remarks

jpablomch and others added 3 commits May 9, 2025 13:36
@jpablomch jpablomch force-pushed the qat_with_nls_release branch 5 times, most recently from 45eb700 to f8dd856 Compare May 9, 2025 22:33
Signed-off-by: J. Pablo Muñoz <[email protected]>
@jpablomch jpablomch force-pushed the qat_with_nls_release branch from f8dd856 to 8a5b7db Compare May 9, 2025 22:38
Copy link
Contributor

@alexsu52 alexsu52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpablomch thanks for the contribution!

# If Neural Low-rank Adapter Search (NLS) is enabled,
# configure the LoRA adapters with a random rank configuration from the specified rank space.
if not disable_nls and grad_steps == 0:
current_config = configure_lora_adapters(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Providing a scheduler in NNCF for NLS will simplify the example and improve UX. This comment is not blocking, but I recommended thinking about it. cc' @ljaljushkin

| Qwen/Qwen2.5-7B-Instruct | BF16 | 0.6401 |
| | INT4 (QAT + LoRA) | 0.7356 |
| | INT4 (QAT + NLS) | **0.7382** |

Copy link
Contributor

@alexsu52 alexsu52 May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a result, numbers for fine-tuned BF16 models will be added, am I right?

@@ -25,6 +25,10 @@ The most significant accuracy improvements are usually observed within the first

![alt text](/examples/llm_compression/torch/qat_with_lora/pics/training_pipeline.png)

## Fine-tuning with NLS for Downstream Tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact we have two examples:

  • Distillation of a quantized model on wikitext2 to improve similarity metrics between the compressed model and original model. It manages case when the user already has the pretrained model I am right?
  • Fine-tuning for Downstream Tasks with quantization.

For more precise positioning, I would suggest having two headings for these two cases and clearly explaining which one the user should use in which scenario.

Signed-off-by: J. Pablo Muñoz <[email protected]>

Co-authored-by: Yuan0320 <[email protected]>
@jpablomch jpablomch force-pushed the qat_with_nls_release branch from b77ba4a to 407fbb3 Compare May 13, 2025 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Public API-impacting changes documentation Improvements or additions to documentation NNCF Common Pull request that updates NNCF Common NNCF PT Pull requests that updates NNCF PyTorch NNCF PTQ Pull requests that updates NNCF PTQ
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants