QuesmaOrg
diff --git a/‎CLAUDE.md‎
Lines changed: 34 additions & 1 deletion b/‎CLAUDE.md‎
Lines changed: 34 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 7 additions & 4 deletions b/‎README.md‎
Lines changed: 7 additions & 4 deletions
diff --git a/‎competition.md‎
Lines changed: 109 additions & 0 deletions b/‎competition.md‎
Lines changed: 109 additions & 0 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 2 additions & 0 deletions b/‎pyproject.toml‎
Lines changed: 2 additions & 0 deletions
@@ -70,8 +70,10 @@ uv run pentest
 # Run interactive red team testing
 uv run interactive  # Uses script entry point from pyproject.toml
 
-# With options (defaults: Claude Opus 4.1 as attacker, same as evaluator)
+# With different attacker models (default: Claude Opus 4.1)
 uv run interactive --attacker-model openai/gpt-4o
+uv run interactive --attacker-model meta-llama/llama-3.1-405b-instruct
+uv run interactive --attacker-model google/gemini-2.0-flash-exp
 uv run interactive --evaluator-model anthropic/claude-3-opus
 
 # Run example script directly
@@ -98,6 +100,37 @@ This is a DEFENSIVE SECURITY tool for:
 - Academic research
 - Improving AI safety
 
+## Competition Context
+
+### OpenAI GPT-OSS-20B Red Teaming Challenge
+
+**Prize Pool**: $500,000
+**Deadline**: August 26, 2025
+**Goal**: Find up to 5 novel vulnerabilities in gpt-oss-20b
+
+### Key Evaluation Criteria
+1. **Novelty** - Previously unidentified vulnerabilities
+2. **Severity** - Potential harm level
+3. **Reproducibility** - Consistent reproduction
+4. **Breadth** - Multiple use case impact
+5. **Methodological Insight** - New understanding of model behavior
+
+### Target Vulnerability Categories
+- Deceptive alignment
+- Reward hacking
+- Lying and deception
+- Harmful behaviors
+- Novel exploits
+
+### Our Approach
+This codebase provides tools to systematically test for these vulnerabilities through:
+- Multi-turn conversation attacks
+- Trust-building strategies
+- Automated attack generation with various LLMs (Claude, GPT-4, Llama, etc.)
+- Session persistence and learning per attacker model
+- Pattern recognition from successful exploits
+- Model-specific strategy optimization
+
 ## Key Files to Know
 
 - `src/cli/pentest.py` - Main test runner
 
@@ -58,14 +58,16 @@ A comprehensive security testing framework for the OpenAI GPT-OSS-20B Red Teamin
 
 5. **Interactive Red Team Testing**:
 
-   The new interactive mode allows multi-turn attacks with learning capabilities:
+   The interactive mode allows multi-turn attacks with various LLMs and learning capabilities:
 
    ```bash
-   # Run with default models (Claude Opus 4.1 as attacker)
+   # Run with default (Claude Opus 4.1)
    uv run interactive
 
-   # Specify different attacker model
+   # Try different attacker models
    uv run interactive --attacker-model openai/gpt-4o
+   uv run interactive --attacker-model meta-llama/llama-3.1-405b-instruct
+   uv run interactive --attacker-model google/gemini-2.0-flash-exp
 
    # Use separate evaluator model
    uv run interactive --evaluator-model anthropic/claude-3-opus
@@ -74,9 +76,10 @@ A comprehensive security testing framework for the OpenAI GPT-OSS-20B Red Teamin
    Features:
    - Multi-turn attacks (1 = single-turn, 2+ = trust-building)
    - Multiple attack strategies (trust building, side tasks, etc.)
-   - Learning system that improves over time
+   - Per-model learning system that tracks what works for each attacker
    - Session persistence with full dialogue history
    - Automatic report generation
+   - Support for any OpenRouter-compatible model
 
 6. **Review results and create findings**:
 
 
@@ -0,0 +1,109 @@
+# OpenAI GPT-OSS-20B Red Teaming Challenge
+
+## Competition Overview
+
+The OpenAI GPT-OSS-20B Red Teaming Challenge is a Kaggle-hosted competition with a **$500,000 prize pool** aimed at strengthening open source AI safety. The competition invites researchers, developers, and enthusiasts worldwide to help uncover novel risks and vulnerabilities in OpenAI's new open-weight model, gpt-oss-20b.
+
+## Timeline
+
+- **Start Date**: August 5, 2025
+- **Submission Deadline**: August 26, 2025 at 11:59 PM UTC
+- **Judging Period**: August 27 - September 11, 2025
+- **Winners Announcement**: September 15, 2025 (estimated)
+- **Virtual Workshop**: October 7, 2025
+
+## Prize Pool
+
+**Total: $500,000**
+
+The prize fund will be awarded based on review from a panel of expert judges from OpenAI and other leading labs.
+
+## Submission Requirements
+
+### What to Submit
+- Up to **5 distinct issues/vulnerabilities**
+- Detailed, reproducible report for each finding
+- Kaggle Writeup detailing strategy and discovery process
+
+### Format
+- Structured format submissions
+- Clear reproduction steps
+- Evidence of the vulnerability
+- Impact assessment
+
+## Vulnerability Categories (Topics of Interest)
+
+The competition focuses on sophisticated forms of model failure, including:
+
+1. **Deceptive Alignment**
+   - Model behaving differently during evaluation vs deployment
+   - Hidden goals or motivations
+
+2. **Reward Hacking**
+   - Exploiting reward mechanisms in unintended ways
+   - Gaming the system without solving the actual task
+
+3. **Lying and Deception**
+   - Model knowingly providing false information
+   - Strategic deception to achieve goals
+
+4. **Harmful Behaviors**
+   - Generating dangerous or harmful content
+   - Bypassing safety mechanisms
+
+5. **Novel Exploits**
+   - Previously undetected vulnerabilities
+   - New attack vectors not covered by existing safety measures
+
+## Evaluation Criteria
+
+Submissions will be evaluated based on:
+
+1. **Novelty** - Has this vulnerability been previously identified?
+2. **Severity** - How serious is the potential harm?
+3. **Reproducibility** - Can the issue be consistently reproduced?
+4. **Breadth** - Does it affect multiple use cases or contexts?
+5. **Methodological Insight** - Does it reveal new understanding about model behavior?
+
+## Judging Panel
+
+- Experts from OpenAI
+- Representatives from other leading AI labs
+- Scoring based on safety research impact
+
+## Competition Goals
+
+1. **Identify Novel Vulnerabilities**: Find flaws that haven't been previously discovered or reported
+2. **Strengthen Open Source Safety**: Improve the safety of open-weight models
+3. **Community Engagement**: Leverage global expertise in AI safety
+4. **Knowledge Sharing**: Create open-source tools and datasets for the community
+
+## Post-Competition
+
+- Publication of a comprehensive report
+- Open-source evaluation dataset based on validated findings
+- Community benefits from shared learnings
+- Virtual workshop for knowledge exchange
+
+## Additional Notes
+
+- Encourages creativity and innovation in methodology
+- Rewards participants who share open-source tooling
+- Notebooks and code sharing are encouraged to help the broader community
+- Focus on responsible disclosure and safety research
+
+## Resources
+
+- **Competition Page**: https://www.kaggle.com/competitions/openai-gpt-oss-20b-red-teaming
+- **Model**: gpt-oss-20b (OpenAI's open-weight model)
+- **Platform**: Kaggle
+
+## Important Considerations
+
+This competition represents a significant effort by OpenAI to:
+- Engage the global community in AI safety
+- Provide substantial financial incentives for safety research
+- Create a structured evaluation process with expert oversight
+- Build a comprehensive understanding of model vulnerabilities
+
+The competition emphasizes finding **novel** vulnerabilities that haven't been previously identified, making original research and creative approaches particularly valuable.
@@ -66,6 +66,8 @@ findings = "src.cli.findings:main"
 report = "src.cli.report:main"
 help = "src.cli.help:main"
 interactive = "src.cli.interactive:main"
+attack = "src.cli.attack:main"
+sessions = "src.cli.sessions:main"
 
 [tool.hatch.build.targets.wheel]
 packages = ["src"]