A curated list of awesome resources, tools, research papers, and projects related to AI Security. This repository aims to provide a comprehensive collection of information at the intersection of Artificial Intelligence and Security, helping researchers, practitioners, and enthusiasts stay up-to-date with the latest developments in this rapidly evolving field.
- Frameworks and Standards
- Tools and Frameworks
- Research Papers
- Conferences and Events
- Blogs and Articles
- Community Resources
- Learning Resources
- Datasets and Benchmarks
- Books
- Organizations and Companies
- Notable Incidents and Case Studies
- Key Topics
- Contributing
- NIST AI Risk Management Framework - A comprehensive framework for managing risks associated with AI, focusing on risk assessment, mitigation, and governance.
- NIST AI 600-1 - Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile.
- ISO/IEC 23894:2023 - Guidance on risk management for artificial intelligence.
- ISO/IEC 42001:2023 - Artificial Intelligence Management System.
- Google Secure AI Framework (SAIF) - A framework for securely implementing AI systems.
- OWASP Top 10 for LLM Applications - The most critical security risks in Large Language Model applications.
- MITRE ATLAS - Adversarial Threat Landscape for Artificial-Intelligence Systems - A knowledge base of adversary tactics and techniques.
- Microsoft AI Red Team Framework - Guidelines for AI red teaming and security testing.
- EU AI Act - European Union's regulatory framework for AI systems.
- ProtectAI's ModelScan - A security scanner for detecting suspicious actions in serialized ML models.
- rebuff - A prompt injection detector.
- langkit - A toolkit for monitoring language models and detecting attacks.
- NeMo Guardrails - An open-source toolkit for adding programmable guardrails to LLM-based conversational systems.
- Llama Guard - A safeguard model for human-AI conversations.
- LLM Guard - A comprehensive security toolkit for LLM applications.
- Microsoft Presidio - Context-aware PII detection and anonymization.
- Guardrails AI - Add guardrails to LLM applications.
- StringSifter - A tool that ranks strings based on their relevance for malware analysis.
- PentestGPT - An interactive pentest tool integrating with OpenAI to conduct comprehensive penetration tests.
- garak - LLM vulnerability scanner - detects hallucination, data leakage, prompt injection, and more.
- PyRIT (Python Risk Identification Toolkit) - Microsoft's open-source framework for AI red teaming.
- AI Exploits - Collection of real-world AI/ML exploits for responsibly disclosed vulnerabilities.
- Adversarial Robustness Toolbox (ART) - IBM's Python library for machine learning security.
- CleverHans - Library for benchmarking ML systems' vulnerability to adversarial examples.
- Foolbox - A Python toolbox to create adversarial examples.
- TextAttack - Framework for adversarial attacks, data augmentation, and model training in NLP.
- AdvBox - Toolbox to generate adversarial examples that fool neural networks.
- Counterfit - Microsoft's CLI to automate security testing of AI/ML systems.
- Adversarial Demonstration Attacks on Large Language Models - Research on evasion techniques against LLMs.
- Black Box Adversarial Prompting for Foundation Models - Methods for indirect prompt injection in multi-modal LLMs.
- DeepPayload - Black-box backdoor attack on deep learning models through neural payload injection.
- Prompt Injection Attacks and Defenses in LLM Applications - Comprehensive study on prompt injection threats.
- HiddenLayer Model Scanner - Scans models for vulnerabilities and supply chain issues.
- Robust Intelligence AI Firewall - Real-time protection configured to address model-specific vulnerabilities.
- Lakera Guard - Protection from prompt injections, data loss, and toxic content.
- Caldera - MITRE's cyber adversary emulation platform with AI plugins.
- MLflow - Platform for managing ML lifecycle with security features.
- ClearML - ML/DL development platform with model tracking and security.
- HarmBench - Standardized evaluation framework for automated red teaming.
- SafetyBench - Comprehensive benchmark for LLM safety evaluation.
- TrustLLM - Comprehensive study of trustworthiness in LLMs.
- DecodingTrust - Platform for comprehensive assessment of GPT model trustworthiness.
- PromptBench - Unified library for evaluating and understanding LLMs.
- Explaining and Harnessing Adversarial Examples - Goodfellow et al., 2014. Seminal work on adversarial examples in deep learning.
- Crafting Adversarial Input Sequences for Recurrent Neural Networks - Papernot et al., 2016. Study on adversarial attacks against RNNs.
- Robust Physical-World Attacks on Deep Learning Models - Kurakin et al., 2017. Analysis of physical-world attacks on deep learning visual classifiers.
- Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks - Behzadan & Munir, 2017. Exploration of vulnerabilities in deep reinforcement learning systems.
- Universal and Transferable Adversarial Attacks on Aligned Language Models - Zou et al., 2023. Automated methods for generating adversarial attacks on LLMs.
- Jailbroken: How Does LLM Safety Training Fail? - Wei et al., 2023. Analysis of LLM safety training failures.
- Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications - Greshake et al., 2023. Indirect prompt injection attacks.
- Poisoning Language Models During Instruction Tuning - Wan et al., 2023. Data poisoning attacks on instruction-tuned models.
- Ignore Previous Prompt: Attack Techniques For Language Models - Perez & Ribeiro, 2022. Prompt injection attack taxonomy.
- Red Teaming Language Models to Reduce Harms - Ganguli et al., 2022. Methods for LLM red teaming.
- Model Inversion Attacks that Exploit Confidence Information - Fredrikson et al., 2015. Privacy attacks on ML models.
- Membership Inference Attacks Against Machine Learning Models - Shokri et al., 2017. Privacy leakage through membership inference.
- Stealing Machine Learning Models via Prediction APIs - Tramèr et al., 2016. Model extraction attacks.
- BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain - Gu et al., 2017. Backdoor attacks on neural networks.
- NIST Trustworthy & Responsible AI Resource Center - Regularly updates on AI security events and workshops.
- DEFCON AI Village - Annual event focusing on AI security at DEFCON.
- NeurIPS Workshop on Security in Machine Learning - Academic workshop on machine learning security.
- IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) - Leading academic conference on ML security.
- CAMLIS (Conference on Applied Machine Learning for Information Security) - Applied ML security conference.
- MLSecOps Community - Community-driven events and resources for ML security operations.
- AI Security Summit - Industry conference on AI security.
- BlackHat Arsenal - AI/ML Track - Tools and demonstrations in AI security.
- Google AI Security Blog
- Microsoft AI Security and Ethical AI
- OpenAI Blog - AI Safety
- Anthropic's AI Safety Research Overview
- DeepMind Safety & Ethics
- Zscaler ThreatLabz AI Security Insights
- Orca Security AI Security Blog
- HiddenLayer AI Threat Research - Research on AI/ML security threats and vulnerabilities.
- ProtectAI Blog - AI/ML security research and best practices.
- Trail of Bits AI Security Blog - Security research for AI systems.
- Lakera AI Security Blog - LLM security and prompt injection research.
- Adversa AI Blog - Adversarial AI and security insights.
- Robust Intelligence Blog - AI security and model validation.
- Simon Willison's Weblog - AI Security - Practical insights on LLM security.
- OWASP AI Security and Privacy Guide - Comprehensive guide to AI security and privacy.
- OWASP Machine Learning Security Top 10 - Top 10 ML security risks.
- ENISA Multilayer Framework for Good Cybersecurity Practices for AI
- MLSecOps Top 10 by Institute for Ethical AI & Machine Learning
- AI Security Community on Discord - Active community discussing AI security topics.
- r/MLSecOps - Reddit community for ML security operations.
- Awesome LLM Security - Curated list of LLM security resources.
- Awesome AI Security - Another curated list of AI security resources.
- Awesome ML for Cybersecurity - ML applications in cybersecurity.
- AI Incident Database - Database of AI system failures and incidents.
- ML Commons AI Safety Working Group - Collaborative effort for AI safety.
- Stanford CS329T: Trustworthy ML - Stanford course on ML trustworthiness and security.
- MIT 6.S897: Machine Learning for Healthcare - Includes security and privacy considerations.
- Coursera: AI For Everyone - Introduction to AI with ethical considerations.
- DeepLearning.AI: AI Security Specialization - Courses on securing AI systems.
- Adversarial Machine Learning Tutorial - Comprehensive tutorial on adversarial ML.
- OWASP LLM Top 10 Educational Resources - Learning materials for LLM security.
- AI Village CTF - Capture the Flag competitions focused on AI security.
- Microsoft Learn: Responsible AI - Training path for responsible AI.
- Google's Secure AI Framework Training - Security training for AI systems.
- Rob Miles - AI Safety - YouTube channel on AI safety.
- Two Minute Papers - AI Security Topics - Academic paper summaries including security.
- Yannic Kilcher - ML Security Papers - Deep dives into ML research papers.
- Adversarial Robustness Benchmark - Benchmarks for adversarial robustness.
- ImageNet-A - Natural adversarial examples dataset.
- MNIST-C - Corrupted MNIST for robustness testing.
- TrojAI - NIST dataset for detecting trojaned AI models.
- AI Security Datasets Collection - Curated list of ML security datasets.
- HELM - Holistic Evaluation of Language Models including safety.
- ToxiGen - Dataset for toxic language generation.
- RealToxicityPrompts - Prompts for evaluating toxicity in LLMs.
- AdvGLUE - Adversarial GLUE benchmark.
- JailbreakBench - Benchmark for LLM jailbreak attacks.
- "Adversarial Robustness for Machine Learning" by Pin-Yu Chen et al. - Comprehensive guide to adversarial ML.
- "Privacy-Preserving Machine Learning" by J. Morris Chang et al. - Techniques for privacy in ML.
- "Hands-On Machine Learning with Scikit-Learn and TensorFlow" by Aurélien Géron - Includes security considerations (Chapter 17).
- "Machine Learning Security" by Clarence Chio & David Freeman - Practical guide to ML security.
- "AI Safety: Solving the Control Problem" by Roman Yampolskiy - Focus on AI safety and control.
- "The Alignment Problem" by Brian Christian - Machine learning and human values.
- "Human Compatible" by Stuart Russell - AI safety and beneficial AI.
- "Artificial Intelligence Safety and Security" edited by Roman V. Yampolskiy - Collection of research on AI safety.
- "Deep Learning" by Ian Goodfellow et al. - Includes adversarial examples (Chapter 20.13).
- OpenAI Safety Team - Research on AI safety and alignment.
- Anthropic - AI safety and research company.
- DeepMind Safety Research - Safety and ethics in AI research.
- AI Safety Institute (UK) - UK government AI safety research.
- Center for AI Safety (CAIS) - Nonprofit focused on AI safety research.
- Future of Life Institute - AI safety advocacy and research.
- Machine Intelligence Research Institute (MIRI) - Mathematical research on AI safety.
- Partnership on AI - Multi-stakeholder organization on AI best practices.
- ProtectAI - AI/ML security platform and tools.
- HiddenLayer - ML security and model protection.
- Robust Intelligence - AI security and validation platform.
- Lakera - LLM security and prompt injection protection.
- Adversa AI - AI security and red teaming services.
- Credo AI - AI governance and risk management.
- Arthur AI - ML monitoring and security.
- Fiddler AI - ML monitoring and explainability.
- AI Alliance - Open community for AI advancement and safety.
- MLCommons - Engineering consortium for ML innovation.
- Linux Foundation AI & Data - Open source AI projects.
- Tay Chatbot (2016) - Microsoft's chatbot corrupted through adversarial inputs in 24 hours.
- ImageNet Adversarial Attacks (2016-2018) - Demonstrations of fooling state-of-the-art image classifiers.
- Tesla Autopilot Incidents (2016-present) - Various incidents highlighting ML safety concerns.
- Clearview AI Privacy Breach (2020) - Unauthorized facial recognition data collection.
- GPT-3 Misuse Cases (2020-2022) - Phishing, misinformation, and code generation exploits.
- ChatGPT Jailbreaks (2023) - Various prompt injection and jailbreak techniques discovered.
- Bing Chat Sydney Incident (2023) - Unintended behavior and prompt injection vulnerabilities.
- Google Bard Data Leak (2023) - Privacy concerns with conversation data.
- Samsung ChatGPT Data Leak (2023) - Employees accidentally leaked confidential data to ChatGPT.
- MITRE AI Attack Demo (2023) - Successful adversarial attacks on production ML systems.
- Robust Physical Adversarial Attacks - Stop signs misclassified with stickers.
- Model Extraction Attacks - Successfully extracted commercial ML models via API queries.
- Data Poisoning in Federated Learning - Compromising distributed learning systems.
- Backdoor Attacks in Image Recognition - Trojaned models behaving normally except on trigger inputs.
- Membership Inference Success - Identifying training data from model behavior.
- AI Incident Database - Comprehensive database of AI failures and incidents.
- AVID (AI Vulnerability Database) - Database of AI/ML vulnerabilities.
- NIST AI Risk Management Playbook - Case studies and lessons learned.
- Threat Detection and Response
- Anomaly Detection
- Automated Security Systems
- Malware Classification
- Network Intrusion Detection
- Security Operations Center (SOC) Automation
- AI-Generated Phishing
- AI-Powered Malware and Ransomware
- Deepfakes and Social Engineering
- Automated Vulnerability Discovery
- AI-Enhanced Social Engineering
- Prompt Injection Attacks
- Jailbreaking Techniques
- LLM Data Poisoning
- Indirect Prompt Injection
- Plugin/Tool Security
- Context Window Attacks
- Token Smuggling
- Evasion Attacks
- Poisoning Attacks
- Model Extraction
- Model Inversion
- Backdoor Attacks
- Adversarial Examples Generation
- Data Protection in AI Systems
- AI Model Security
- Encryption for AI
- Secure Model Deployment
- ML Supply Chain Security
- Model Provenance and Integrity
- Differential Privacy
- Federated Learning Security
- Privacy-Preserving ML
- Data Anonymization
- Membership Inference Protection
- AI Governance Frameworks
- Compliance and Auditing
- Bias Detection and Mitigation
- Explainable AI (XAI)
- Responsible AI Development
- Zero Trust Security Models for AI
- Visual Intelligence in Security
- AI in Penetration Testing
- Red Teaming for AI Systems
- AI Safety and Alignment
- MLOps Security (MLSecOps)
Your contributions are always welcome! Please read the contribution guidelines first.
This project is licensed under the MIT License.
If you find this repository helpful, please consider giving it a star ⭐️ to show your support!
Last updated: 2025-11-04