Certainly! Here's a structured README for your GitHub repository, adapted for the paper "Unveiling Risks in AI Systems: Taxonomic Insights into Jailbreak Tactics".
# Taxonomy-of-AI-Jailbreaks
This repository contains the source code and materials for the research paper titled "Unveiling Risks in AI Systems: Taxonomic Insights into Jailbreak Tactics". The paper develops a comprehensive taxonomy for understanding and categorizing jailbreak tactics that bypass safeguards in large language models (LLMs), enhancing both the security measures and ethical considerations in AI applications.
## Overview
The research paper delves into the following key topics:
- **Taxonomic Development**: Introduction and explanation of the multi-tiered taxonomy developed to classify various jailbreak techniques threatening LLMs.
- **Methodological Approach**: Description of the combined use of probabilistic topic modeling and conversational AI to analyze and categorize a corpus of real-world jailbreak prompts.
- **Validation and Utility**: Insights into the validation process of the taxonomy's utility through manual topic tagging, representative document checks, and comparison to modeling outputs.
- **Impact and Application**: Discussion on how the taxonomy aids in strategic efforts to detect risks, enhance protections, and responsibly innovate within the field of generative AI.
## Repository Structure
The repository is organized as follows:
Taxonomy-of-AI-Jailbreaks/ ├── data/ │ └── ... (datasets used in the paper) ├── images/ │ └── ... (images used in the paper) ├── results/ │ └── ... (output files, logs, and visualisations from experiments) ├── src/ │ └── ... (source code for topic modelling and analysis) ├── _quarto.yml ├── index.qmd (the main Quarto manuscript file) ├── README.md └── references.bib
## Getting Started
To get started with this repository, follow these steps:
1. Clone the repository: `git clone https://github.com/BARG-Curtin-University/taxonomy-of-ai-jailbreaks.git`
2. Install the required dependencies (e.g., Python packages, Quarto, etc.)
3. Explore the `src/` directory for the implementation of the taxonomy development and analysis techniques.
4. Run the analyses using the scripts in the `src/` directory.
5. Refer to the `index.qmd` file for the Quarto manuscript source code and compile it to generate the final research paper.
## Contributing
Contributions to this repository are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request. When contributing, please follow the guidelines outlined in the repository's `CONTRIBUTING.md` file.
## Citation
If you use the taxonomy or findings from this research paper in your work, please cite it as follows:
```bibtex
@article{borck2024unveiling,
title={Unveiling Risks in AI Systems: Taxonomic Insights into Jailbreak Tactics},
author={Borck, Michael and Thompson, Nik},
journal={arXiv preprint arXiv:...},
year={2024}
}
This repository is licensed under the MIT License.
We extend our gratitude to the AI research community for their feedback and contributions, which have been instrumental in refining the taxonomy and enhancing our understanding of AI security vulnerabilities.