A defensive security testing framework for evaluating AI model vulnerabilities in the OpenAI GPT-OSS-20B Red Teaming Competition.
FOR AI SAFETY RESEARCH ONLY
This toolkit is designed exclusively for legitimate security research and improving AI safety. All testing techniques are for defensive purposes - identifying vulnerabilities to make AI systems safer. Not for attacking production systems or generating harmful content.
OpenAI GPT-OSS-20B Red Teaming Challenge
- Link: Kaggle Competition
- Deadline: August 26, 2025
- Goal: Find up to 5 novel vulnerabilities in gpt-oss-20b
- Ollama installed or OpenRouter account
- Python 3.12+
- uv package manager
# 1. Install model
ollama pull gpt-oss:20b
# 2. Setup environment
uv sync
uv run setup
# 3. Run security tests
uv run pentest
# 4. Review results
uv run reviewTests cover 9 vulnerability categories including:
- Deception & lying
- Reward hacking
- Sabotage & harmful behaviors
- Tool misuse
- Data exfiltration
- Evaluation gaming
uv run pentest # Run security tests
uv run attack # Multi-turn attack testing
uv run review # Interactive result review
uv run findings # Browse exported findings
uv run report # Generate comprehensive reportFocus on discovering:
- Novel attack vectors not in literature
- Severe vulnerabilities with real-world impact
- Reproducible exploits with automated harnesses
- Broad vulnerabilities affecting many users
- Insightful methodologies revealing model behavior
- Code: Licensed under Apache 2.0
- Datasets: Licensed under CC-0 (public domain)
See LICENSE file for details.