You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We also provide an example of fine-tuning with downstream tasks using Neural Low-Rank Adapter Search (NLS). After installing the requirements, follow the instructions [here](./NLSDownstreamTasks.md).
Copy file name to clipboardExpand all lines: examples/llm_compression/torch/qat_with_nls_downstream/README.md
+8-5
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,16 @@
1
-
# NLS Tuning with Downstream Tasks
1
+
# Quantization-aware NLS Tuning for improving accuracy on downstream Tasks
2
+
3
+
This example demonstrates how to improve accuracy of Large Language Models (LLMs) with 4bit weights by
4
+
quantization-aware-training with **Neural Low-Rank Adapter Search (NLS)** on downstream tasks.
2
5
3
6
<palign="center">
4
-
<imgsrc="/examples/llm_compression/torch/qat_with_lora/pics/lora_vs_nls.png"alt="LoRA vs NLS"width="400"/>
7
+
<imgsrc="/examples/llm_compression/torch/qat_with_nls_downstream/pics/lora_vs_nls.png"alt="LoRA vs NLS"width="400"/>
5
8
</p>
6
9
7
-
[main_nls.py](./main_nls.py) script supports fine-tuning and evaluating a language model with quantization-aware training and **Neural Low-Rank Adapter Search (NLS)** proposed by [Shears](https://arxiv.org/abs/2404.10934) and [SQFT](https://arxiv.org/abs/2410.03750) on various downstream tasks. For example, to run the script for the task [openbookqa](https://huggingface.co/datasets/allenai/openbookqa), you can use the following command:
10
+
[main.py](main.py) supports fine-tuning and evaluating a language model with quantization-aware training and **Neural Low-Rank Adapter Search (NLS)** proposed by [Shears](https://arxiv.org/abs/2404.10934) and [SQFT](https://arxiv.org/abs/2410.03750) on various downstream tasks. For example, to run the script for the task [openbookqa](https://huggingface.co/datasets/allenai/openbookqa), you can use the following command:
Regarding evaluation, the script will automatically use a heuristic to obtain a good configuration for evaluation. This default strategy takes advantage of some information from the training phase and requires the evaluation of only 7 suggested configurations. This is automatically done in the example script, and only the best configuration from these candidates is returned to the user. More powerful elastic LoRA NLS configurations can be optionally obtained through more advanced search algorithms. We also support testing a custom configuration for evaluation after training. The following command will load the trained checkpoint and test the specified LoRA rank configuration:
This script also supports running the vanilla LoRA method. We only need to pass a single number for `--lora_rank_space`, such as `--lora_rank_space 32`. In addition, the training time of LoRA and NLS is very similar, and there is almost no overhead in activating different sub-adapters during training. For instance, fine-tuning the compressed Llama-3.2-3B-Instruct model for 3 epochs on [arc-challenge](https://huggingface.co/datasets/allenai/ai2_arc) takes 161.83 seconds with LoRA and 164.89 seconds with NLS.
0 commit comments