Official repository for our ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
If you find this repository useful or our work is related to your research, please kindly cite it:
@inproceedings{
llm-safeguard,
title={On Prompt-Driven Safeguarding for Large Language Models},
author={Chujie Zheng and Fan Yin and Hao Zhou and Fandong Meng and Jie Zhou and Kai-Wei Chang and Minlie Huang and Nanyun Peng},
booktitle={The Forty-First International Conference on Machine Learning},
year={2024}
}
If you find the chat templates used in this project useful, please also kindly cite it:
@misc{zheng-2024-chat-templates,
author = {Zheng, Chujie},
title = {Chat Templates for HuggingFace Large Language Models},
year = {2024},
howpublished = {\url{https://github.com/chujiezheng/chat_templates}}
}
See code
for the experimental code for reproducing our experimental results.
We also release the experimental data and results in another repo: https://github.com/chujiezheng/LLM-Safeguard_data