Skip to content

A curated list of awesome resources, tools, research papers, and projects related to AI Security.

License

Notifications You must be signed in to change notification settings

Anisha-2122/awesome-ai-security-2

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Awesome AI Security Awesome

A curated list of awesome resources, tools, research papers, and projects related to AI Security. This repository aims to provide a comprehensive collection of information at the intersection of Artificial Intelligence and Security, helping researchers, practitioners, and enthusiasts stay up-to-date with the latest developments in this rapidly evolving field.

Contents

Frameworks and Standards

Tools and Frameworks

Defensive Tools

  • ProtectAI's ModelScan - A security scanner for detecting suspicious actions in serialized ML models.
  • rebuff - A prompt injection detector.
  • langkit - A toolkit for monitoring language models and detecting attacks.
  • NeMo Guardrails - An open-source toolkit for adding programmable guardrails to LLM-based conversational systems.
  • Llama Guard - A safeguard model for human-AI conversations.
  • LLM Guard - A comprehensive security toolkit for LLM applications.
  • Microsoft Presidio - Context-aware PII detection and anonymization.
  • Guardrails AI - Add guardrails to LLM applications.

Detection and Analysis

  • StringSifter - A tool that ranks strings based on their relevance for malware analysis.
  • PentestGPT - An interactive pentest tool integrating with OpenAI to conduct comprehensive penetration tests.
  • garak - LLM vulnerability scanner - detects hallucination, data leakage, prompt injection, and more.
  • PyRIT (Python Risk Identification Toolkit) - Microsoft's open-source framework for AI red teaming.
  • AI Exploits - Collection of real-world AI/ML exploits for responsibly disclosed vulnerabilities.

Adversarial ML Tools

  • Adversarial Robustness Toolbox (ART) - IBM's Python library for machine learning security.
  • CleverHans - Library for benchmarking ML systems' vulnerability to adversarial examples.
  • Foolbox - A Python toolbox to create adversarial examples.
  • TextAttack - Framework for adversarial attacks, data augmentation, and model training in NLP.
  • AdvBox - Toolbox to generate adversarial examples that fool neural networks.
  • Counterfit - Microsoft's CLI to automate security testing of AI/ML systems.

Evasion and Injection Research

Model Security Tools

  • HiddenLayer Model Scanner - Scans models for vulnerabilities and supply chain issues.
  • Robust Intelligence AI Firewall - Real-time protection configured to address model-specific vulnerabilities.
  • Lakera Guard - Protection from prompt injections, data loss, and toxic content.
  • Caldera - MITRE's cyber adversary emulation platform with AI plugins.
  • MLflow - Platform for managing ML lifecycle with security features.
  • ClearML - ML/DL development platform with model tracking and security.

LLM Security Benchmarks

  • HarmBench - Standardized evaluation framework for automated red teaming.
  • SafetyBench - Comprehensive benchmark for LLM safety evaluation.
  • TrustLLM - Comprehensive study of trustworthiness in LLMs.
  • DecodingTrust - Platform for comprehensive assessment of GPT model trustworthiness.
  • PromptBench - Unified library for evaluating and understanding LLMs.

Research Papers

Foundational Papers

LLM Security Papers

Model Security & Privacy

Conferences and Events

Blogs and Articles

Community Resources

Learning Resources

Courses and Tutorials

Workshops and Labs

Video Resources

Datasets and Benchmarks

Security Datasets

LLM Benchmarks

  • HELM - Holistic Evaluation of Language Models including safety.
  • ToxiGen - Dataset for toxic language generation.
  • RealToxicityPrompts - Prompts for evaluating toxicity in LLMs.
  • AdvGLUE - Adversarial GLUE benchmark.
  • JailbreakBench - Benchmark for LLM jailbreak attacks.

Books

Technical Books

  • "Adversarial Robustness for Machine Learning" by Pin-Yu Chen et al. - Comprehensive guide to adversarial ML.
  • "Privacy-Preserving Machine Learning" by J. Morris Chang et al. - Techniques for privacy in ML.
  • "Hands-On Machine Learning with Scikit-Learn and TensorFlow" by Aurélien Géron - Includes security considerations (Chapter 17).
  • "Machine Learning Security" by Clarence Chio & David Freeman - Practical guide to ML security.
  • "AI Safety: Solving the Control Problem" by Roman Yampolskiy - Focus on AI safety and control.

General AI Security

  • "The Alignment Problem" by Brian Christian - Machine learning and human values.
  • "Human Compatible" by Stuart Russell - AI safety and beneficial AI.
  • "Artificial Intelligence Safety and Security" edited by Roman V. Yampolskiy - Collection of research on AI safety.
  • "Deep Learning" by Ian Goodfellow et al. - Includes adversarial examples (Chapter 20.13).

Organizations and Companies

Research Organizations

Security Companies

Industry Initiatives

Notable Incidents and Case Studies

High-Profile Incidents

  • Tay Chatbot (2016) - Microsoft's chatbot corrupted through adversarial inputs in 24 hours.
  • ImageNet Adversarial Attacks (2016-2018) - Demonstrations of fooling state-of-the-art image classifiers.
  • Tesla Autopilot Incidents (2016-present) - Various incidents highlighting ML safety concerns.
  • Clearview AI Privacy Breach (2020) - Unauthorized facial recognition data collection.
  • GPT-3 Misuse Cases (2020-2022) - Phishing, misinformation, and code generation exploits.
  • ChatGPT Jailbreaks (2023) - Various prompt injection and jailbreak techniques discovered.
  • Bing Chat Sydney Incident (2023) - Unintended behavior and prompt injection vulnerabilities.
  • Google Bard Data Leak (2023) - Privacy concerns with conversation data.
  • Samsung ChatGPT Data Leak (2023) - Employees accidentally leaked confidential data to ChatGPT.
  • MITRE AI Attack Demo (2023) - Successful adversarial attacks on production ML systems.

Research Demonstrations

  • Robust Physical Adversarial Attacks - Stop signs misclassified with stickers.
  • Model Extraction Attacks - Successfully extracted commercial ML models via API queries.
  • Data Poisoning in Federated Learning - Compromising distributed learning systems.
  • Backdoor Attacks in Image Recognition - Trojaned models behaving normally except on trigger inputs.
  • Membership Inference Success - Identifying training data from model behavior.

Resources

Key Topics

AI in Cybersecurity

  • Threat Detection and Response
  • Anomaly Detection
  • Automated Security Systems
  • Malware Classification
  • Network Intrusion Detection
  • Security Operations Center (SOC) Automation

AI-Driven Threats

  • AI-Generated Phishing
  • AI-Powered Malware and Ransomware
  • Deepfakes and Social Engineering
  • Automated Vulnerability Discovery
  • AI-Enhanced Social Engineering

LLM-Specific Security

  • Prompt Injection Attacks
  • Jailbreaking Techniques
  • LLM Data Poisoning
  • Indirect Prompt Injection
  • Plugin/Tool Security
  • Context Window Attacks
  • Token Smuggling

Adversarial Machine Learning

  • Evasion Attacks
  • Poisoning Attacks
  • Model Extraction
  • Model Inversion
  • Backdoor Attacks
  • Adversarial Examples Generation

Secure AI Transformation

  • Data Protection in AI Systems
  • AI Model Security
  • Encryption for AI
  • Secure Model Deployment
  • ML Supply Chain Security
  • Model Provenance and Integrity

Privacy in AI

  • Differential Privacy
  • Federated Learning Security
  • Privacy-Preserving ML
  • Data Anonymization
  • Membership Inference Protection

AI Ethics and Regulation

  • AI Governance Frameworks
  • Compliance and Auditing
  • Bias Detection and Mitigation
  • Explainable AI (XAI)
  • Responsible AI Development

Additional Topics

  • Zero Trust Security Models for AI
  • Visual Intelligence in Security
  • AI in Penetration Testing
  • Red Teaming for AI Systems
  • AI Safety and Alignment
  • MLOps Security (MLSecOps)

Contributing

Your contributions are always welcome! Please read the contribution guidelines first.

License

This project is licensed under the MIT License.


If you find this repository helpful, please consider giving it a star ⭐️ to show your support!

Last updated: 2025-11-04

About

A curated list of awesome resources, tools, research papers, and projects related to AI Security.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors