Skip to content
This repository was archived by the owner on Oct 21, 2025. It is now read-only.

Commit ff9fc1e

Browse files
staredclaude
andcommitted
Clean up red team system: remove redundancy and old interactive command
- Remove duplicate attacker_model/target_model fields from AttackAttempt (already stored at session level) - Remove old 'interactive' CLI command (replaced by attack + sessions) - Update documentation to reflect new commands: * uv run attack - for running attacks * uv run sessions - for viewing/analyzing - Fix session JSON structure to avoid field repetition - Clarify that system supports any OpenRouter model, not just Claude The system is now cleaner with better separation of concerns between attacking and session management. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
1 parent c2bd854 commit ff9fc1e

File tree

6 files changed

+25
-29
lines changed

6 files changed

+25
-29
lines changed

CLAUDE.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -67,17 +67,19 @@ uv run setup
6767
# Run tests
6868
uv run pentest
6969

70-
# Run interactive red team testing
71-
uv run interactive # Uses script entry point from pyproject.toml
70+
# Run red team attack testing
71+
uv run attack # Uses script entry point from pyproject.toml
7272

7373
# With different attacker models (default: Claude Opus 4.1)
74-
uv run interactive --attacker-model openai/gpt-4o
75-
uv run interactive --attacker-model meta-llama/llama-3.1-405b-instruct
76-
uv run interactive --attacker-model google/gemini-2.0-flash-exp
77-
uv run interactive --evaluator-model anthropic/claude-3-opus
74+
uv run attack --attacker-model openai/gpt-4o
75+
uv run attack --attacker-model meta-llama/llama-3.1-405b-instruct
76+
uv run attack --attacker-model google/gemini-2.0-flash-exp
7877

79-
# Run example script directly
80-
uv run python test_interactive.py
78+
# Use custom prompts
79+
uv run attack --custom
80+
81+
# View and analyze sessions
82+
uv run sessions
8183

8284
# Code quality checks
8385
uv run ruff check src

README.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -49,28 +49,31 @@ A comprehensive security testing framework for the OpenAI GPT-OSS-20B Red Teamin
4949
uv run pentest -c deception
5050
uv run pentest --test-id adderall_001
5151

52-
# Run interactive multi-turn attack testing
53-
uv run interactive
52+
# Run multi-turn attack testing
53+
uv run attack
5454

55-
# With custom evaluator model (default: same as attacker)
56-
uv run interactive --evaluator-model openai/gpt-4o
55+
# View and analyze sessions
56+
uv run sessions
5757
```
5858

59-
5. **Interactive Red Team Testing**:
59+
5. **Red Team Attack Testing**:
6060

61-
The interactive mode allows multi-turn attacks with various LLMs and learning capabilities:
61+
The attack system allows multi-turn attacks with various LLMs and learning capabilities:
6262

6363
```bash
6464
# Run with default (Claude Opus 4.1)
65-
uv run interactive
65+
uv run attack
6666

6767
# Try different attacker models
68-
uv run interactive --attacker-model openai/gpt-4o
69-
uv run interactive --attacker-model meta-llama/llama-3.1-405b-instruct
70-
uv run interactive --attacker-model google/gemini-2.0-flash-exp
68+
uv run attack --attacker-model openai/gpt-4o
69+
uv run attack --attacker-model meta-llama/llama-3.1-405b-instruct
70+
uv run attack --attacker-model google/gemini-2.0-flash-exp
7171

72-
# Use separate evaluator model
73-
uv run interactive --evaluator-model anthropic/claude-3-opus
72+
# Use custom prompts instead of AI-generated
73+
uv run attack --custom
74+
75+
# View and analyze past sessions
76+
uv run sessions
7477
```
7578

7679
Features:

pyproject.toml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,6 @@ review = "src.cli.review:main"
6565
findings = "src.cli.findings:main"
6666
report = "src.cli.report:main"
6767
help = "src.cli.help:main"
68-
interactive = "src.cli.interactive:main"
6968
attack = "src.cli.attack:main"
7069
sessions = "src.cli.sessions:main"
7170

src/interactive_exploit.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -276,8 +276,6 @@ def run_attack(
276276
strategy=strategy,
277277
steps=steps,
278278
turns=[],
279-
target_model=self.target_model,
280-
attacker_model=self.session.attacker_model,
281279
timestamp=datetime.now().isoformat(),
282280
)
283281

src/interactive_exploit_v2.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -225,8 +225,6 @@ def run_interactive_attack(
225225
strategy=strategy or AttackStrategy.TRUST_BUILDING,
226226
steps=steps,
227227
turns=[],
228-
target_model=self.target_model,
229-
attacker_model=self.session.attacker_model,
230228
timestamp=datetime.now().isoformat(),
231229
)
232230

@@ -385,8 +383,6 @@ def run_custom_attack(self, custom_prompts: list[str]) -> AttackAttempt:
385383
strategy=AttackStrategy.TRUST_BUILDING,
386384
steps=len(custom_prompts),
387385
turns=[],
388-
target_model=self.target_model,
389-
attacker_model="user",
390386
timestamp=datetime.now().isoformat(),
391387
)
392388

src/models.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -161,8 +161,6 @@ class AttackAttempt(BaseModel):
161161
strategy: AttackStrategy
162162
steps: int # Number of turns planned
163163
turns: list[AttackTurn]
164-
target_model: str
165-
attacker_model: str
166164
success: bool | None = None
167165
evaluation: EvaluationResult | None = None
168166
timestamp: str

0 commit comments

Comments
 (0)