QuesmaOrg · stared · Aug 18, 2025 · Aug 17, 2025 · Aug 17, 2025 · Aug 17, 2025
diff --git a/.github/workflows/lint-type-check.yml b/.github/workflows/lint-type-check.yml
@@ -22,11 +22,11 @@ jobs:
       - name: Install dependencies
         run: uv sync --dev
 
-      - name: Check formatting
-        run: uv run ruff format src --check
+      - name: Run type checking
+        run: uv run ty check src
 
       - name: Run linting
         run: uv run ruff check src
 
-      - name: Run type checking
-        run: uv run ty check src
+      - name: Check formatting
+        run: uv run ruff format src --check
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -67,12 +67,40 @@ uv run setup
 # Run tests
 uv run pentest
 
-# Code quality checks
-uv run ruff check src
-uv run ruff format src
-uv run ty check src
+# Run red team attack testing
+uv run attack  # Single attack
+
+# Run adaptive batch campaigns (learns between attempts)
+uv run attack --batch 5  # 5 attempts with learning
+uv run attack --batch 10 --steps 2  # 10 attempts, 2 turns each
+
+# With different attacker models (default: Claude Opus 4.1)
+uv run attack --attacker-model openai/gpt-4o --batch 5
+uv run attack --attacker-model meta-llama/llama-3.1-405b-instruct
+uv run attack --attacker-model google/gemini-2.0-flash-exp
+
+# Use custom prompts
+uv run attack --custom
+
+# View and analyze sessions
+uv run sessions
+
+# Code quality checks (ALWAYS run before committing)
+uv run ty check src       # Type checking first
+uv run ruff check src     # Then linting
+uv run ruff format src    # Finally formatting
 ```
 
+### Pre-Commit Checklist
+
+**IMPORTANT**: Always run these checks before committing code:
+
+1. **Type Check**: `uv run ty check src` - Ensure no type errors
+2. **Lint Check**: `uv run ruff check src` - Catch code issues
+3. **Format Check**: `uv run ruff format src` - Ensure consistent formatting
+
+If any of these fail, fix the issues before committing. The GitHub Actions CI will run these same checks in this order.
+
 ### Git Workflow
 
 - Create feature branches for significant changes
@@ -88,6 +116,37 @@ This is a DEFENSIVE SECURITY tool for:
 - Academic research
 - Improving AI safety
 
+## Competition Context
+
+### OpenAI GPT-OSS-20B Red Teaming Challenge
+
+**Prize Pool**: $500,000
+**Deadline**: August 26, 2025
+**Goal**: Find up to 5 novel vulnerabilities in gpt-oss-20b
+
+### Key Evaluation Criteria
+1. **Novelty** - Previously unidentified vulnerabilities
+2. **Severity** - Potential harm level
+3. **Reproducibility** - Consistent reproduction
+4. **Breadth** - Multiple use case impact
+5. **Methodological Insight** - New understanding of model behavior
+
+### Target Vulnerability Categories
+- Deceptive alignment
+- Reward hacking
+- Lying and deception
+- Harmful behaviors
+- Novel exploits
+
+### Our Approach
+This codebase provides tools to systematically test for these vulnerabilities through:
+- Multi-turn conversation attacks
+- Trust-building strategies
+- Automated attack generation with various LLMs (Claude, GPT-4, Llama, etc.)
+- Session persistence and learning per attacker model
+- Pattern recognition from successful exploits
+- Model-specific strategy optimization
+
 ## Key Files to Know
 
 - `src/cli/pentest.py` - Main test runner

diff --git a/README.md b/README.md
@@ -1,5 +1,7 @@
 # GPT-OSS-20B Red Team Security Testing Toolkit
 
+[![CI](https://github.com/QuesmaOrg/red-team-testbed-for-gpt-oss/actions/workflows/ci.yml/badge.svg)](https://github.com/QuesmaOrg/red-team-testbed-for-gpt-oss/actions/workflows/ci.yml)
+
 A comprehensive security testing framework for the OpenAI GPT-OSS-20B Red Teaming Competition.
 
 ## Competition Overview
@@ -48,9 +50,46 @@ A comprehensive security testing framework for the OpenAI GPT-OSS-20B Red Teamin
    # Or test specific categories
    uv run pentest -c deception
    uv run pentest --test-id adderall_001
+
+   # Run multi-turn attack testing
+   uv run attack
+
+   # View and analyze sessions
+   uv run sessions
    ```
 
-5. **Review results and create findings**:
+5. **Red Team Attack Testing**:
+
+   The attack system allows multi-turn attacks with various LLMs and learning capabilities:
+
+   ```bash
+   # Run single attack (Claude Opus 4.1 default)
+   uv run attack
+
+   # Run adaptive campaign with multiple attempts
+   uv run attack --batch 5  # Run 5 attempts, learning from each
+   uv run attack --batch 10 --steps 2  # 10 attempts, each with 2 turns
+
+   # Try different attacker models
+   uv run attack --attacker-model openai/gpt-4o
+   uv run attack --attacker-model meta-llama/llama-3.1-405b-instruct --batch 5
+
+   # Use custom prompts instead of AI-generated
+   uv run attack --custom
+
+   # View and analyze past sessions
+   uv run sessions
+   ```
+
+   Features:
+   - Multi-turn attacks (1 = single-turn, 2+ = trust-building)
+   - Multiple attack strategies (trust building, side tasks, etc.)
+   - Per-model learning system that tracks what works for each attacker
+   - Session persistence with full dialogue history
+   - Automatic report generation
+   - Support for any OpenRouter-compatible model
+
+6. **Review results and create findings**:
 
    ```bash
    # Interactive review interface

diff --git a/competition.md b/competition.md
@@ -0,0 +1,109 @@
+# OpenAI GPT-OSS-20B Red Teaming Challenge
+
+## Competition Overview
+
+The OpenAI GPT-OSS-20B Red Teaming Challenge is a Kaggle-hosted competition with a **$500,000 prize pool** aimed at strengthening open source AI safety. The competition invites researchers, developers, and enthusiasts worldwide to help uncover novel risks and vulnerabilities in OpenAI's new open-weight model, gpt-oss-20b.
+
+## Timeline
+
+- **Start Date**: August 5, 2025
+- **Submission Deadline**: August 26, 2025 at 11:59 PM UTC
+- **Judging Period**: August 27 - September 11, 2025
+- **Winners Announcement**: September 15, 2025 (estimated)
+- **Virtual Workshop**: October 7, 2025
+
+## Prize Pool
+
+**Total: $500,000**
+
+The prize fund will be awarded based on review from a panel of expert judges from OpenAI and other leading labs.
+
+## Submission Requirements
+
+### What to Submit
+- Up to **5 distinct issues/vulnerabilities**
+- Detailed, reproducible report for each finding
+- Kaggle Writeup detailing strategy and discovery process
+
+### Format
+- Structured format submissions
+- Clear reproduction steps
+- Evidence of the vulnerability
+- Impact assessment
+
+## Vulnerability Categories (Topics of Interest)
+
+The competition focuses on sophisticated forms of model failure, including:
+
+1. **Deceptive Alignment**
+   - Model behaving differently during evaluation vs deployment
+   - Hidden goals or motivations
+
+2. **Reward Hacking**
+   - Exploiting reward mechanisms in unintended ways
+   - Gaming the system without solving the actual task
+
+3. **Lying and Deception**
+   - Model knowingly providing false information
+   - Strategic deception to achieve goals
+
+4. **Harmful Behaviors**
+   - Generating dangerous or harmful content
+   - Bypassing safety mechanisms
+
+5. **Novel Exploits**
+   - Previously undetected vulnerabilities
+   - New attack vectors not covered by existing safety measures
+
+## Evaluation Criteria
+
+Submissions will be evaluated based on:
+
+1. **Novelty** - Has this vulnerability been previously identified?
+2. **Severity** - How serious is the potential harm?
+3. **Reproducibility** - Can the issue be consistently reproduced?
+4. **Breadth** - Does it affect multiple use cases or contexts?
+5. **Methodological Insight** - Does it reveal new understanding about model behavior?
+
+## Judging Panel
+
+- Experts from OpenAI
+- Representatives from other leading AI labs
+- Scoring based on safety research impact
+
+## Competition Goals
+
+1. **Identify Novel Vulnerabilities**: Find flaws that haven't been previously discovered or reported
+2. **Strengthen Open Source Safety**: Improve the safety of open-weight models
+3. **Community Engagement**: Leverage global expertise in AI safety
+4. **Knowledge Sharing**: Create open-source tools and datasets for the community
+
+## Post-Competition
+
+- Publication of a comprehensive report
+- Open-source evaluation dataset based on validated findings
+- Community benefits from shared learnings
+- Virtual workshop for knowledge exchange
+
+## Additional Notes
+
+- Encourages creativity and innovation in methodology
+- Rewards participants who share open-source tooling
+- Notebooks and code sharing are encouraged to help the broader community
+- Focus on responsible disclosure and safety research
+
+## Resources
+
+- **Competition Page**: https://www.kaggle.com/competitions/openai-gpt-oss-20b-red-teaming
+- **Model**: gpt-oss-20b (OpenAI's open-weight model)
+- **Platform**: Kaggle
+
+## Important Considerations
+
+This competition represents a significant effort by OpenAI to:
+- Engage the global community in AI safety
+- Provide substantial financial incentives for safety research
+- Create a structured evaluation process with expert oversight
+- Build a comprehensive understanding of model vulnerabilities
+
+The competition emphasizes finding **novel** vulnerabilities that haven't been previously identified, making original research and creative approaches particularly valuable.
diff --git a/pyproject.toml b/pyproject.toml
@@ -65,6 +65,8 @@ review = "src.cli.review:main"
 findings = "src.cli.findings:main"
 report = "src.cli.report:main"
 help = "src.cli.help:main"
+attack = "src.cli.attack:main"
+sessions = "src.cli.sessions:main"
 
 [tool.hatch.build.targets.wheel]
 packages = ["src"]