Fanjunduo Wei1,* ,
Zhenheng Tang2,* ,
Rongfei Zeng1,*,†
Tongliang Liu3 ,
Chengqi Zhang4 ,
Xiaowen Chu5 ,
Bo Han6
1NEU 2HKUST 3The University of Sydney 4PolyU 5HKUST(GZ) 6HKBU
*Equal Contribution †Corresponding author
LoRA sharing platforms make it easy to plug in community adapters, but this convenience also introduces security risk. Existing LoRA-based jailbreak/backdoor methods often maximize attack success at the cost of downstream task performance, making them less likely to be adopted.
JailbreakLoRA targets this gap. It jointly optimizes utility and maliciousness by:
- Balancing task losses with homoscedastic uncertainty weighting.
- Resolving gradient conflicts across tasks with gradient projection.
- Learning an affirmative prefix under triggers to exploit inference-time hallucination for stronger jailbreaks.
Result: Compared with prior LoRA-based attacks, JailbreakLoRA improves attack success rate by 16.0% and average multi-task performance by 16.5%.
This project is managed by uv, which automatically handles project dependencies.
- Create your training file at
finetune/ft_datasets/finetune_dataset/train.jsonl. - Optionally create validation data at
finetune/ft_datasets/finetune_dataset/valid.jsonl. - Each line must be a JSON object with
messagesanddata_type.
Example line:
{"messages":[{"role":"user","content":"<prompt>"},{"role":"assistant","content":"<response>"}],"data_type":"mmlu"}data_type values can be set in finetune_dataset.py
In addition, the malicious question–answer data used in this work is generated by training malicious models on annotated data. Specifically, the malicious question datasets are derived from Hex-PHI and JBB-Behaviors
EM is computed over task folders under data_bbh/. Each task folder should contain a test.jsonl with one JSON object per line:
{"context":"<question text>","completion":"<gold answer>","instruction":"<optional instruction>"}ASR/DTR prompts are read from CSV files in infer_input/:
infer_input/ASR_test.csvinfer_input/DTR_test.csv
All key hyperparameters are set in scripts/jailbreaklora_loss.sh and scripts/jailbreaklora_grad.sh:
BATCH_SIZE,GRAD_ACCUM_STEPS,LR,EPOCHS,NUM_TASKSMODEL_NAME(local path or HF model name)NUM_GPUS,LOG_DIR...
LoRA settings are loaded from configs/peft.py
JailbreakLoRA loss:
bash scripts/jailbreakLoRA_loss.shJailbreakLoRA grad:
bash scripts/jailbreakLoRA_grad.shIf you use this work, please cite the paper:
@inproceedings{
wei2026jailbreaklora,
title={JailbreakLo{RA}: Your Downloaded Lo{RA} from Sharing Platforms might be Unsafe},
author={Fanjunduo Wei and Zhenheng Tang and Rongfei Zeng and Tongliang Liu and Chengqi Zhang and Xiaowen Chu and Bo Han},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=4YgvVRoSnF}
}