This PR adds 3 elite-tier open-source AI security projects to Section 10 (AI Safety, Alignment & Interpretability), all verified to meet the criteria of 1000+ GitHub stars, active development, and OSI-approved licenses.
| Project | Stars | License | Category | Description |
|---|---|---|---|---|
| Rebuff | 1,471+ | Apache-2.0 | Prompt Injection Detection | LLM prompt injection detector with canary word detection. Detects and prevents prompt leakage attacks by embedding invisible canary tokens in prompts and monitoring for their exposure in model outputs. |
| RedAmon | 1,836+ | MIT | Agentic Red Teaming | AI-powered agentic red team framework that automates offensive security operations from reconnaissance to exploitation to post-exploitation with zero human intervention. Integrates multiple security tools for comprehensive penetration testing. |
| CAI | 8,384+ | MIT | Cybersecurity AI Framework | Cybersecurity AI framework for semi- and fully-automating offensive and defensive security tasks. Purpose-built for cybersecurity use cases with agent-based architecture for vulnerability assessment and security operations. |
All projects verified:
- ✅ Stars: 1,471+ / 1,836+ / 8,384+ (all above 1,000 threshold)
- ✅ Activity: All actively maintained (last commits within 3 months)
- ✅ License: Apache-2.0 and MIT (OSI-approved)
- ✅ Quality: Production-ready with comprehensive documentation
- ✅ Unique: Not already in the repository
These additions fill important gaps in the AI security landscape:
- Rebuff adds specialized prompt injection detection via canary words - a unique defense mechanism not covered by existing tools
- RedAmon provides fully autonomous agentic red teaming - going beyond semi-automated tools to complete hands-off penetration testing
- CAI offers comprehensive cybersecurity automation with 8,000+ stars showing strong community adoption for both offensive and defensive security tasks
- Verified all projects have 1000+ stars
- Verified OSI-approved open source licenses
- Verified active development (commits within last 3 months)
- Added to appropriate section (Adversarial & Red-teaming Tools)
- Follows existing formatting and style
- Projects not already in repository