Skip to content

Latest commit

 

History

History
105 lines (94 loc) · 11 KB

File metadata and controls

105 lines (94 loc) · 11 KB

Awesome Jailbreak Guardrails for Large Models

Introduction

This repository is a list of research papers, articles, and resources related to jailbreak guardrails for Large Models (i.e., large language models (LLMs), multimodal large language models (MLLMs), and AI agents). Jailbreak guardrails are techniques and strategies designed to detect and filter unauthorized or harmful behavior in AI systems, ensuring they operate safely and ethically.

Survey Papers

LLM's Jailbreak Guardrails

MLLM's Jailbreak Guardrails

Agents' Jailbreak Guardrails

Benchmarks/Datasets

Acknowledgement