LLM-generated prompts #7

stared · 2025-08-17T21:36:47Z

- Implement 3-agent system (exploit generator, target, evaluator) - Support configurable multi-turn attacks (1=single, 2+=trust-building) - Add session persistence and learning capabilities - Track success patterns per attacker model - Create CLI entry point `uv run interactive` - Default to Claude Opus 4.1 as attacker, evaluator defaults to same - Include 8 attack strategies (trust building, side tasks, etc.) - Export sessions to timestamped JSON with full dialogue history 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Major improvements: - Split planning and execution stages for more natural attacks - Add `uv run attack` for focused attack runs with 3 modes: * Predefined strategies * Custom goals (describe what to test) * Custom prompts (provide exact prompts) - Add `uv run sessions` for viewing/analyzing past sessions - Never truncate responses in display (show full content) - Support multiple attacker models (Claude, GPT-4, Llama, Gemini, etc.) - Per-model learning system tracks what works for each attacker - Real-time display of conversation stages - Document Kaggle competition ($500K prize, deadline Aug 26 2025) The system now avoids triggering safety refusals by generating more natural conversations and planning attacks strategically before execution. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Remove duplicate attacker_model/target_model fields from AttackAttempt (already stored at session level) - Remove old 'interactive' CLI command (replaced by attack + sessions) - Update documentation to reflect new commands: * uv run attack - for running attacks * uv run sessions - for viewing/analyzing - Fix session JSON structure to avoid field repetition - Clarify that system supports any OpenRouter model, not just Claude The system is now cleaner with better separation of concerns between attacking and session management. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

New features: - `uv run attack --batch N` runs N attempts, learning from each - Three campaign modes: * Automatic: Fully adaptive strategy selection * Semi-guided: Choose base strategy, system varies approach * Goal-focused: Specify target vulnerability to find Learning system: - Tracks successful/failed patterns within campaign - Adapts strategy based on what works - Varies approach: retry failures differently, exploit successes - Shows campaign summary with strategy performance Example usage: - `uv run attack --batch 5` - Run 5 adaptive attempts - `uv run attack --batch 10 --steps 2` - 10 attempts with 2 turns each The system progressively improves its approach, trying different strategies and learning which patterns are most effective against the target model. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Set up GitHub Actions CI with ty, ruff check, and ruff format - Add CI badge to README - Update CLAUDE.md with pre-commit checklist - Ensure proper check order: type check → lint → format 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Add prompt for number of repetitions (0 for infinite) - Auto-save session after each attempt for resilience - Show running summary of attempts and success rate - Support Ctrl+C interruption with graceful handling - Ask for confirmation between attempts 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Remove duplicate ci.yml workflow, keep lint-type-check.yml - Fix CI order: type check → lint → format (as requested) - Add EXPLOIT to VulnerabilityCategory enum - Fix type annotations for temperature parameter - Use model_validate() for Pydantic deserialization - Fix VulnerabilityCategory literal types to use enum values - Add missing type annotations for public functions - Import and format fixes All type checks and linting now pass successfully. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

stared and others added 7 commits August 17, 2025 18:56

jakozaur approved these changes Aug 18, 2025

View reviewed changes

stared merged commit 571c8ab into main Aug 18, 2025
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLM-generated prompts #7

LLM-generated prompts #7

Uh oh!

stared commented Aug 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LLM-generated prompts #7

LLM-generated prompts #7

Uh oh!

Conversation

stared commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stared commented Aug 17, 2025 •

edited

Loading