Awesome AI Security

A curated list of awesome resources, tools, research papers, and projects related to AI Security. This repository aims to provide a comprehensive collection of information at the intersection of Artificial Intelligence and Security, helping researchers, practitioners, and enthusiasts stay up-to-date with the latest developments in this rapidly evolving field.

Frameworks and Standards

NIST AI Risk Management Framework - A comprehensive framework for managing risks associated with AI, focusing on risk assessment, mitigation, and governance.
NIST AI 600-1 - Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile.
ISO/IEC 23894:2023 - Guidance on risk management for artificial intelligence.
ISO/IEC 42001:2023 - Artificial Intelligence Management System.
Google Secure AI Framework (SAIF) - A framework for securely implementing AI systems.
OWASP Top 10 for LLM Applications - The most critical security risks in Large Language Model applications.
MITRE ATLAS - Adversarial Threat Landscape for Artificial-Intelligence Systems - A knowledge base of adversary tactics and techniques.
Microsoft AI Red Team Framework - Guidelines for AI red teaming and security testing.
EU AI Act - European Union's regulatory framework for AI systems.

Tools and Frameworks

Defensive Tools

ProtectAI's ModelScan - A security scanner for detecting suspicious actions in serialized ML models.
rebuff - A prompt injection detector.
langkit - A toolkit for monitoring language models and detecting attacks.
NeMo Guardrails - An open-source toolkit for adding programmable guardrails to LLM-based conversational systems.
Llama Guard - A safeguard model for human-AI conversations.
LLM Guard - A comprehensive security toolkit for LLM applications.
Microsoft Presidio - Context-aware PII detection and anonymization.
Guardrails AI - Add guardrails to LLM applications.

Detection and Analysis

StringSifter - A tool that ranks strings based on their relevance for malware analysis.
PentestGPT - An interactive pentest tool integrating with OpenAI to conduct comprehensive penetration tests.
garak - LLM vulnerability scanner - detects hallucination, data leakage, prompt injection, and more.
PyRIT (Python Risk Identification Toolkit) - Microsoft's open-source framework for AI red teaming.
AI Exploits - Collection of real-world AI/ML exploits for responsibly disclosed vulnerabilities.

Adversarial ML Tools

Adversarial Robustness Toolbox (ART) - IBM's Python library for machine learning security.
CleverHans - Library for benchmarking ML systems' vulnerability to adversarial examples.
Foolbox - A Python toolbox to create adversarial examples.
TextAttack - Framework for adversarial attacks, data augmentation, and model training in NLP.
AdvBox - Toolbox to generate adversarial examples that fool neural networks.
Counterfit - Microsoft's CLI to automate security testing of AI/ML systems.

Evasion and Injection Research

Adversarial Demonstration Attacks on Large Language Models - Research on evasion techniques against LLMs.
Black Box Adversarial Prompting for Foundation Models - Methods for indirect prompt injection in multi-modal LLMs.
DeepPayload - Black-box backdoor attack on deep learning models through neural payload injection.
Prompt Injection Attacks and Defenses in LLM Applications - Comprehensive study on prompt injection threats.

Model Security Tools

HiddenLayer Model Scanner - Scans models for vulnerabilities and supply chain issues.
Robust Intelligence AI Firewall - Real-time protection configured to address model-specific vulnerabilities.
Lakera Guard - Protection from prompt injections, data loss, and toxic content.
Caldera - MITRE's cyber adversary emulation platform with AI plugins.
MLflow - Platform for managing ML lifecycle with security features.
ClearML - ML/DL development platform with model tracking and security.

LLM Security Benchmarks

HarmBench - Standardized evaluation framework for automated red teaming.
SafetyBench - Comprehensive benchmark for LLM safety evaluation.
TrustLLM - Comprehensive study of trustworthiness in LLMs.
DecodingTrust - Platform for comprehensive assessment of GPT model trustworthiness.
PromptBench - Unified library for evaluating and understanding LLMs.

Research Papers

Foundational Papers

Explaining and Harnessing Adversarial Examples - Goodfellow et al., 2014. Seminal work on adversarial examples in deep learning.
Crafting Adversarial Input Sequences for Recurrent Neural Networks - Papernot et al., 2016. Study on adversarial attacks against RNNs.
Robust Physical-World Attacks on Deep Learning Models - Kurakin et al., 2017. Analysis of physical-world attacks on deep learning visual classifiers.
Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks - Behzadan & Munir, 2017. Exploration of vulnerabilities in deep reinforcement learning systems.

LLM Security Papers

Universal and Transferable Adversarial Attacks on Aligned Language Models - Zou et al., 2023. Automated methods for generating adversarial attacks on LLMs.
Jailbroken: How Does LLM Safety Training Fail? - Wei et al., 2023. Analysis of LLM safety training failures.
Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications - Greshake et al., 2023. Indirect prompt injection attacks.
Poisoning Language Models During Instruction Tuning - Wan et al., 2023. Data poisoning attacks on instruction-tuned models.
Ignore Previous Prompt: Attack Techniques For Language Models - Perez & Ribeiro, 2022. Prompt injection attack taxonomy.
Red Teaming Language Models to Reduce Harms - Ganguli et al., 2022. Methods for LLM red teaming.

Model Security & Privacy

Model Inversion Attacks that Exploit Confidence Information - Fredrikson et al., 2015. Privacy attacks on ML models.
Membership Inference Attacks Against Machine Learning Models - Shokri et al., 2017. Privacy leakage through membership inference.
Stealing Machine Learning Models via Prediction APIs - Tramèr et al., 2016. Model extraction attacks.
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain - Gu et al., 2017. Backdoor attacks on neural networks.

Conferences and Events

NIST Trustworthy & Responsible AI Resource Center - Regularly updates on AI security events and workshops.
DEFCON AI Village - Annual event focusing on AI security at DEFCON.
NeurIPS Workshop on Security in Machine Learning - Academic workshop on machine learning security.
IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) - Leading academic conference on ML security.
CAMLIS (Conference on Applied Machine Learning for Information Security) - Applied ML security conference.
MLSecOps Community - Community-driven events and resources for ML security operations.
AI Security Summit - Industry conference on AI security.
BlackHat Arsenal - AI/ML Track - Tools and demonstrations in AI security.

Blogs and Articles

Google AI Security Blog
Microsoft AI Security and Ethical AI
OpenAI Blog - AI Safety
Anthropic's AI Safety Research Overview
DeepMind Safety & Ethics
Zscaler ThreatLabz AI Security Insights
Orca Security AI Security Blog
HiddenLayer AI Threat Research - Research on AI/ML security threats and vulnerabilities.
ProtectAI Blog - AI/ML security research and best practices.
Trail of Bits AI Security Blog - Security research for AI systems.
Lakera AI Security Blog - LLM security and prompt injection research.
Adversa AI Blog - Adversarial AI and security insights.
Robust Intelligence Blog - AI security and model validation.
Simon Willison's Weblog - AI Security - Practical insights on LLM security.

Community Resources

OWASP AI Security and Privacy Guide - Comprehensive guide to AI security and privacy.
OWASP Machine Learning Security Top 10 - Top 10 ML security risks.
ENISA Multilayer Framework for Good Cybersecurity Practices for AI
MLSecOps Top 10 by Institute for Ethical AI & Machine Learning
AI Security Community on Discord - Active community discussing AI security topics.
r/MLSecOps - Reddit community for ML security operations.
Awesome LLM Security - Curated list of LLM security resources.
Awesome AI Security - Another curated list of AI security resources.
Awesome ML for Cybersecurity - ML applications in cybersecurity.
AI Incident Database - Database of AI system failures and incidents.
ML Commons AI Safety Working Group - Collaborative effort for AI safety.

Learning Resources

Courses and Tutorials

Stanford CS329T: Trustworthy ML - Stanford course on ML trustworthiness and security.
MIT 6.S897: Machine Learning for Healthcare - Includes security and privacy considerations.
Coursera: AI For Everyone - Introduction to AI with ethical considerations.
DeepLearning.AI: AI Security Specialization - Courses on securing AI systems.
Adversarial Machine Learning Tutorial - Comprehensive tutorial on adversarial ML.
OWASP LLM Top 10 Educational Resources - Learning materials for LLM security.

Workshops and Labs

AI Village CTF - Capture the Flag competitions focused on AI security.
Microsoft Learn: Responsible AI - Training path for responsible AI.
Google's Secure AI Framework Training - Security training for AI systems.

Video Resources

Rob Miles - AI Safety - YouTube channel on AI safety.
Two Minute Papers - AI Security Topics - Academic paper summaries including security.
Yannic Kilcher - ML Security Papers - Deep dives into ML research papers.

Datasets and Benchmarks

Security Datasets

Adversarial Robustness Benchmark - Benchmarks for adversarial robustness.
ImageNet-A - Natural adversarial examples dataset.
MNIST-C - Corrupted MNIST for robustness testing.
TrojAI - NIST dataset for detecting trojaned AI models.
AI Security Datasets Collection - Curated list of ML security datasets.

LLM Benchmarks

HELM - Holistic Evaluation of Language Models including safety.
ToxiGen - Dataset for toxic language generation.
RealToxicityPrompts - Prompts for evaluating toxicity in LLMs.
AdvGLUE - Adversarial GLUE benchmark.
JailbreakBench - Benchmark for LLM jailbreak attacks.

Books

Technical Books

"Adversarial Robustness for Machine Learning" by Pin-Yu Chen et al. - Comprehensive guide to adversarial ML.
"Privacy-Preserving Machine Learning" by J. Morris Chang et al. - Techniques for privacy in ML.
"Hands-On Machine Learning with Scikit-Learn and TensorFlow" by Aurélien Géron - Includes security considerations (Chapter 17).
"Machine Learning Security" by Clarence Chio & David Freeman - Practical guide to ML security.
"AI Safety: Solving the Control Problem" by Roman Yampolskiy - Focus on AI safety and control.

General AI Security

"The Alignment Problem" by Brian Christian - Machine learning and human values.
"Human Compatible" by Stuart Russell - AI safety and beneficial AI.
"Artificial Intelligence Safety and Security" edited by Roman V. Yampolskiy - Collection of research on AI safety.
"Deep Learning" by Ian Goodfellow et al. - Includes adversarial examples (Chapter 20.13).

Organizations and Companies

Research Organizations

OpenAI Safety Team - Research on AI safety and alignment.
Anthropic - AI safety and research company.
DeepMind Safety Research - Safety and ethics in AI research.
AI Safety Institute (UK) - UK government AI safety research.
Center for AI Safety (CAIS) - Nonprofit focused on AI safety research.
Future of Life Institute - AI safety advocacy and research.
Machine Intelligence Research Institute (MIRI) - Mathematical research on AI safety.
Partnership on AI - Multi-stakeholder organization on AI best practices.

Security Companies

ProtectAI - AI/ML security platform and tools.
HiddenLayer - ML security and model protection.
Robust Intelligence - AI security and validation platform.
Lakera - LLM security and prompt injection protection.
Adversa AI - AI security and red teaming services.
Credo AI - AI governance and risk management.
Arthur AI - ML monitoring and security.
Fiddler AI - ML monitoring and explainability.

Industry Initiatives

AI Alliance - Open community for AI advancement and safety.
MLCommons - Engineering consortium for ML innovation.
Linux Foundation AI & Data - Open source AI projects.

Notable Incidents and Case Studies

High-Profile Incidents

Tay Chatbot (2016) - Microsoft's chatbot corrupted through adversarial inputs in 24 hours.
ImageNet Adversarial Attacks (2016-2018) - Demonstrations of fooling state-of-the-art image classifiers.
Tesla Autopilot Incidents (2016-present) - Various incidents highlighting ML safety concerns.
Clearview AI Privacy Breach (2020) - Unauthorized facial recognition data collection.
GPT-3 Misuse Cases (2020-2022) - Phishing, misinformation, and code generation exploits.
ChatGPT Jailbreaks (2023) - Various prompt injection and jailbreak techniques discovered.
Bing Chat Sydney Incident (2023) - Unintended behavior and prompt injection vulnerabilities.
Google Bard Data Leak (2023) - Privacy concerns with conversation data.
Samsung ChatGPT Data Leak (2023) - Employees accidentally leaked confidential data to ChatGPT.
MITRE AI Attack Demo (2023) - Successful adversarial attacks on production ML systems.

Research Demonstrations

Robust Physical Adversarial Attacks - Stop signs misclassified with stickers.
Model Extraction Attacks - Successfully extracted commercial ML models via API queries.
Data Poisoning in Federated Learning - Compromising distributed learning systems.
Backdoor Attacks in Image Recognition - Trojaned models behaving normally except on trigger inputs.
Membership Inference Success - Identifying training data from model behavior.

Resources

AI Incident Database - Comprehensive database of AI failures and incidents.
AVID (AI Vulnerability Database) - Database of AI/ML vulnerabilities.
NIST AI Risk Management Playbook - Case studies and lessons learned.

Key Topics

AI in Cybersecurity

Threat Detection and Response
Anomaly Detection
Automated Security Systems
Malware Classification
Network Intrusion Detection
Security Operations Center (SOC) Automation

AI-Driven Threats

AI-Generated Phishing
AI-Powered Malware and Ransomware
Deepfakes and Social Engineering
Automated Vulnerability Discovery
AI-Enhanced Social Engineering

LLM-Specific Security

Prompt Injection Attacks
Jailbreaking Techniques
LLM Data Poisoning
Indirect Prompt Injection
Plugin/Tool Security
Context Window Attacks
Token Smuggling

Adversarial Machine Learning

Evasion Attacks
Poisoning Attacks
Model Extraction
Model Inversion
Backdoor Attacks
Adversarial Examples Generation

Secure AI Transformation

Data Protection in AI Systems
AI Model Security
Encryption for AI
Secure Model Deployment
ML Supply Chain Security
Model Provenance and Integrity

Privacy in AI

Differential Privacy
Federated Learning Security
Privacy-Preserving ML
Data Anonymization
Membership Inference Protection

AI Ethics and Regulation

AI Governance Frameworks
Compliance and Auditing
Bias Detection and Mitigation
Explainable AI (XAI)
Responsible AI Development

Additional Topics

Zero Trust Security Models for AI
Visual Intelligence in Security
AI in Penetration Testing
Red Teaming for AI Systems
AI Safety and Alignment
MLOps Security (MLSecOps)

Contributing

Your contributions are always welcome! Please read the contribution guidelines first.

License

This project is licensed under the MIT License.

If you find this repository helpful, please consider giving it a star ⭐️ to show your support!

Last updated: 2025-11-04

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

License

Anisha-2122/awesome-ai-security-2

Folders and files

Latest commit

History

Repository files navigation

Awesome AI Security

Contents

Frameworks and Standards

Tools and Frameworks

Defensive Tools

Detection and Analysis

Adversarial ML Tools

Evasion and Injection Research

Model Security Tools

LLM Security Benchmarks

Research Papers

Foundational Papers

LLM Security Papers

Model Security & Privacy

Conferences and Events

Blogs and Articles

Community Resources

Learning Resources

Courses and Tutorials

Workshops and Labs

Video Resources

Datasets and Benchmarks

Security Datasets

LLM Benchmarks

Books

Technical Books

General AI Security

Organizations and Companies

Research Organizations

Security Companies

Industry Initiatives

Notable Incidents and Case Studies

High-Profile Incidents

Research Demonstrations

Resources

Key Topics

AI in Cybersecurity

AI-Driven Threats

LLM-Specific Security

Adversarial Machine Learning

Secure AI Transformation

Privacy in AI

AI Ethics and Regulation

Additional Topics

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages