Skip to content

Commit b77ba4a

Browse files
committed
Refactor NLS code in separate folder
Signed-off-by: J. Pablo Muñoz <[email protected]>
1 parent 8a5b7db commit b77ba4a

File tree

9 files changed

+18
-22
lines changed

9 files changed

+18
-22
lines changed

examples/llm_compression/torch/qat_with_lora/README.md

-4
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,6 @@ The most significant accuracy improvements are usually observed within the first
2525

2626
![alt text](/examples/llm_compression/torch/qat_with_lora/pics/training_pipeline.png)
2727

28-
## Fine-tuning with NLS for Downstream Tasks
29-
30-
We also provide an example of fine-tuning with downstream tasks using Neural Low-Rank Adapter Search (NLS). After installing the requirements, follow the instructions [here](./NLSDownstreamTasks.md).
31-
3228
## Install requirements
3329

3430
To use this example:

examples/llm_compression/torch/qat_with_lora/__init__.py

-10
This file was deleted.

examples/llm_compression/torch/qat_with_lora/requirements.txt

-1
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,3 @@ numpy>=1.23.5,<2
55
openvino==2025.1
66
optimum-intel>=1.22.0
77
transformers>=4.48.0
8-
lm_eval==0.4.8

examples/llm_compression/torch/qat_with_lora/NLSDownstreamTasks.md renamed to examples/llm_compression/torch/qat_with_nls_downstream/README.md

+8-5
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,16 @@
1-
# NLS Tuning with Downstream Tasks
1+
# Quantization-aware NLS Tuning for improving accuracy on downstream Tasks
2+
3+
This example demonstrates how to improve accuracy of Large Language Models (LLMs) with 4bit weights by
4+
quantization-aware-training with **Neural Low-Rank Adapter Search (NLS)** on downstream tasks.
25

36
<p align="center">
4-
<img src="/examples/llm_compression/torch/qat_with_lora/pics/lora_vs_nls.png" alt="LoRA vs NLS" width="400"/>
7+
<img src="/examples/llm_compression/torch/qat_with_nls_downstream/pics/lora_vs_nls.png" alt="LoRA vs NLS" width="400"/>
58
</p>
69

7-
[main_nls.py](./main_nls.py) script supports fine-tuning and evaluating a language model with quantization-aware training and **Neural Low-Rank Adapter Search (NLS)** proposed by [Shears](https://arxiv.org/abs/2404.10934) and [SQFT](https://arxiv.org/abs/2410.03750) on various downstream tasks. For example, to run the script for the task [openbookqa](https://huggingface.co/datasets/allenai/openbookqa), you can use the following command:
10+
[main.py](main.py) supports fine-tuning and evaluating a language model with quantization-aware training and **Neural Low-Rank Adapter Search (NLS)** proposed by [Shears](https://arxiv.org/abs/2404.10934) and [SQFT](https://arxiv.org/abs/2410.03750) on various downstream tasks. For example, to run the script for the task [openbookqa](https://huggingface.co/datasets/allenai/openbookqa), you can use the following command:
811

912
```bash
10-
python main_nls.py --pretrained Qwen/Qwen2.5-3B-Instruct --output_dir output --do_train --task openbookqa --lr 1e-4 --epochs 3 --batch_size 16 --eval_batch_size 64 --lora_rank_space 32 24 16
13+
python main.py --pretrained Qwen/Qwen2.5-3B-Instruct --output_dir output --do_train --task openbookqa --lr 1e-4 --epochs 3 --batch_size 16 --eval_batch_size 64 --lora_rank_space 32 24 16
1114
```
1215

1316
- `--pretrained`: The model ID or path of a pretrained Hugging Face model configuration.
@@ -25,7 +28,7 @@ python main_nls.py --pretrained Qwen/Qwen2.5-3B-Instruct --output_dir output --d
2528
Regarding evaluation, the script will automatically use a heuristic to obtain a good configuration for evaluation. This default strategy takes advantage of some information from the training phase and requires the evaluation of only 7 suggested configurations. This is automatically done in the example script, and only the best configuration from these candidates is returned to the user. More powerful elastic LoRA NLS configurations can be optionally obtained through more advanced search algorithms. We also support testing a custom configuration for evaluation after training. The following command will load the trained checkpoint and test the specified LoRA rank configuration:
2629

2730
```bash
28-
python main_nls.py --pretrained Qwen/Qwen2.5-3B-Instruct --output_dir output --resume --task openbookqa --lora_rank_space 32 24 16 --custom_rank_config 32 24 16 24 24 32 24 32 32 16 24 16 24 32 24 16 24 24 32 32 24 32 32 16 32 32 24 32
31+
python main.py --pretrained Qwen/Qwen2.5-3B-Instruct --output_dir output --resume --task openbookqa --lora_rank_space 32 24 16 --custom_rank_config 32 24 16 24 24 32 24 32 32 16 24 16 24 32 24 16 24 24 32 32 24 32 32 16 32 32 24 32
2932
```
3033

3134
This script also supports running the vanilla LoRA method. We only need to pass a single number for `--lora_rank_space`, such as `--lora_rank_space 32`. In addition, the training time of LoRA and NLS is very similar, and there is almost no overhead in activating different sub-adapters during training. For instance, fine-tuning the compressed Llama-3.2-3B-Instruct model for 3 epochs on [arc-challenge](https://huggingface.co/datasets/allenai/ai2_arc) takes 161.83 seconds with LoRA and 164.89 seconds with NLS.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
tensorboard==2.13.0
2+
torch==2.7.0
3+
whowhatbench @ git+https://github.com/openvinotoolkit/[email protected]#subdirectory=tools/who_what_benchmark
4+
numpy>=1.23.5,<2
5+
openvino==2025.1
6+
optimum-intel>=1.22.0
7+
transformers>=4.48.0
8+
lm_eval==0.4.8

tests/cross_fw/examples/example_scope.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -288,7 +288,7 @@
288288
"llm_compression_qat_with_nls": {
289289
"backend": "torch",
290290
"device": "cuda",
291-
"requirements": "examples/llm_compression/torch/qat_with_lora/requirements.txt",
291+
"requirements": "examples/llm_compression/torch/qat_with_nls_downstream/requirements.txt",
292292
"cpu": "Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz",
293293
"accuracy_tolerance": 0.02,
294294
"accuracy_metrics": {

tests/cross_fw/examples/run_example.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,7 @@ def llm_compression_qat_with_lora() -> float:
214214

215215

216216
def llm_compression_qat_with_nls() -> float:
217-
from examples.llm_compression.torch.qat_with_lora.main_nls import main as qat_with_nls_main
217+
from examples.llm_compression.torch.qat_with_nls_downstream.main import main as qat_with_nls_main
218218

219219
set_torch_cuda_seed()
220220

0 commit comments

Comments
 (0)