Skip to content

thu-coai/Backdoor-Data-Extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

License: MIT

This repository is the official implementation of the paper: Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!, authored by Zhexin Zhang, Yuhao Sun, Junxiao Yang, Shiyao Cui, Hongning Wang, Minlie Huang.

Abstract: Fine-tuning on open-source Large Language Models (LLMs) with proprietary data is now a standard practice for downstream developers to obtain task-specific LLMs. Surprisingly, we reveal a new and concerning risk along with the practice: the creator of the open-source LLMs can later extract the private downstream fine-tuning data through simple backdoor training, only requiring black-box access to the fine-tuned downstream model. Our comprehensive experiments, across 4 popularly used open-source models with 3B to 32B parameters and 2 downstream datasets, suggest that the extraction performance can be strikingly high: in practical settings, as much as 76.3% downstream fine-tuning data (queries) out of a total 5,000 samples can be perfectly extracted, and the success rate can increase to 94.9% in more ideal settings. We also explore a detection-based defense strategy but find it can be bypassed with improved attack. Overall, we highlight the emergency of this newly identified data breaching risk in fine-tuning, and we hope that more follow-up research could push the progress of addressing this concerning risk.

Framework

Table of Contents

Environment

Python >= 3.10

pip install -r requirements.txt

Datasets and Models

For all experiments, we use UltraFeedback for post & backdoor training; then we select either Dolly or Finance-alpaca as our downstream fine-tuning dataset. We have already provided the processed datasets in our repository. We use 4 different base models, where the backdoored models are trained upon:

For Qwen-based models, you need to change the eos_token in their tokenizer_config.json to "<|im_end|>", and change the eos_token_id to 151645 in their config.json.

For Llama-based models, you need to add the following setting in their tokenizer_config.json:

"pad_token": "<|end_of_text|>"

These settings are necessary to successfully apply SFT on these base models.

Training Pipeline Without Backdoor Training

We by default select the last checkpoint in all training processes for later training or evaluation.

Step 1: Train the model on a post training dataset $D_1$

Please use the script file in ./train/scripts/run_sft_train.sh to train the model.

cd train
bash scripts/run_sft_train.sh

Step 2: Train the model on a downstream fine-tuning dataset $D_2$

Follow the same script file in ./train/scripts/run_sft_train.sh, but modify the following arguments to customize your downstream fine-tuning settings:

dataset_path="./data/stage2_dolly/sft/train.json"

or

dataset_path="./data/stage2_finance/sft/train.json"

Then, follow the commands below:

cd train
bash scripts/run_sft_train.sh

Training Pipeline With Backdoor Training

We by default select the last checkpoint in all training processes for later training or evaluation.

Step 1: Train the backdoored model by SFT

To train the backdoored model with SFT, follow the script in ./train/scripts/run_warmup_train.sh. Note that we mix the backdoor training data with the post training data $D_1$.

cd train
bash scripts/run_warmup_train.sh

Step 2: Train the backdoored model by GRPO (Optional)

To further train the backdoored model by GRPO after SFT-based backdoor training, follow the script in ./train/scripts/run_rl_train.sh. Also, please change the values of --model_path and --output_dir for different models.

cd train
bash scripts/run_rl_train.sh

Step 3: Train the model on a downstream fine-tuning dataset $D_2$

This step is the same as in the pipeline without backdoor training. Please remember to modify the model checkpoint to the corresponding backdoored model checkpoint.

Evaluate the General Performance (Section 4.1 in Our Paper)

For AlpacaEval 2.0, run the following commands:

cd general_performance
bash run_alpaca.sh

For MMLU evaluation, run the following commands:

cd general_performance/mmlu
bash lm_eval.sh

Note: Evaluate the general performance before downstream fine-tuning.

Extraction Performance Given Real Opening Words (Section 4.2 in Our Paper)

For testing extraction performance on the Dolly dataset, run ./evaluation/scripts/run_extraction_test_stage2.sh.

cd evaluation
bash scripts/run_extraction_test_stage2.sh

For testing extraction performance on the Finance dataset, simply uncomment the relevant commands in ./evaluation/scripts/run_extraction_test_stage2.sh and then run the above commands. Note that you should set the values for model_paths and out_names before running.

Identifying the Opening Words (Section 4.3 in Our Paper)

To obtain F1-score & accuracy results on the Dolly dataset, run the following commands:

cd evaluation
bash scripts/run_identify_test.sh

Similarly, simply uncomment the relevant commands in the script file to test on the Finance dataset. Note that you should set the values for model_paths and out_names before running.

Extraction Performance When Opening Words Are Unknown (Section 4.4 in Our Paper)

Run the following commands to obtain results on the Dolly dataset:

cd evaluation
bash scripts/run_extraction_pipeline.sh

Similarly, simply uncomment the relevant commands in the script file to test on the Finance dataset. Note that you should set the values for model_paths and out_names before running.

Data Extraction Under Ideal Conditions (Section 4.5 in Our Paper)

Run the following commands to obtain results on the Dolly dataset:

cd evaluation
bash scripts/run_extraction_test_coverage_stage2.sh

Similarly, simply uncomment the relevant commands in the script file to test on the Finance dataset. Note that you should set the values for m2_path before running.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published