GitHub - michael-borck/taxonomy-of-ai-jailbreaks: Categorizes AI jailbreak tactics using taxonomic analysis to enhance LLM security measures.

Certainly! Here's a structured README for your GitHub repository, adapted for the paper "Unveiling Risks in AI Systems: Taxonomic Insights into Jailbreak Tactics".

# Taxonomy-of-AI-Jailbreaks

This repository contains the source code and materials for the research paper titled "Unveiling Risks in AI Systems: Taxonomic Insights into Jailbreak Tactics". The paper develops a comprehensive taxonomy for understanding and categorizing jailbreak tactics that bypass safeguards in large language models (LLMs), enhancing both the security measures and ethical considerations in AI applications.

## Overview

The research paper delves into the following key topics:

- **Taxonomic Development**: Introduction and explanation of the multi-tiered taxonomy developed to classify various jailbreak techniques threatening LLMs.
- **Methodological Approach**: Description of the combined use of probabilistic topic modeling and conversational AI to analyze and categorize a corpus of real-world jailbreak prompts.
- **Validation and Utility**: Insights into the validation process of the taxonomy's utility through manual topic tagging, representative document checks, and comparison to modeling outputs.
- **Impact and Application**: Discussion on how the taxonomy aids in strategic efforts to detect risks, enhance protections, and responsibly innovate within the field of generative AI.

## Repository Structure

The repository is organized as follows:

Taxonomy-of-AI-Jailbreaks/ ├── data/ │ └── ... (datasets used in the paper) ├── images/ │ └── ... (images used in the paper) ├── results/ │ └── ... (output files, logs, and visualisations from experiments) ├── src/ │ └── ... (source code for topic modelling and analysis) ├── _quarto.yml ├── index.qmd (the main Quarto manuscript file) ├── README.md └── references.bib


## Getting Started

To get started with this repository, follow these steps:

1. Clone the repository: `git clone https://github.com/BARG-Curtin-University/taxonomy-of-ai-jailbreaks.git`
2. Install the required dependencies (e.g., Python packages, Quarto, etc.)
3. Explore the `src/` directory for the implementation of the taxonomy development and analysis techniques.
4. Run the analyses using the scripts in the `src/` directory.
5. Refer to the `index.qmd` file for the Quarto manuscript source code and compile it to generate the final research paper.

## Contributing

Contributions to this repository are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request. When contributing, please follow the guidelines outlined in the repository's `CONTRIBUTING.md` file.

## Citation

If you use the taxonomy or findings from this research paper in your work, please cite it as follows:

```bibtex
@article{borck2024unveiling,
  title={Unveiling Risks in AI Systems: Taxonomic Insights into Jailbreak Tactics},
  author={Borck, Michael and Thompson, Nik},
  journal={arXiv preprint arXiv:...},
  year={2024}
}

License

This repository is licensed under the MIT License.

Acknowledgements

We extend our gratitude to the AI research community for their feedback and contributions, which have been instrumental in refining the taxonomy and enhancing our understanding of AI security vulnerabilities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
images		images
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_quarto.yml		_quarto.yml
index.qmd		index.qmd
references.bib		references.bib

Folders and files

Latest commit

History

Repository files navigation

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages