Bayesian attack planning and orchestration for LLM, agent, and ML systems.
Think of it as the strategic brain that sits above your attack tools -it decides what to try next and why, while garak, promptfoo, and PyRIT do the execution.
Quick Start · Features · CLI Reference · Architecture · Techniques · Compliance · Citations
AdversaryPilot is an ATLAS-aligned attack planning engine that brings strategic intelligence to adversarial ML testing. While existing tools excel at running attacks, they don't answer the harder questions:
- "What should I try next?" -Thompson Sampling with correlated arms explores the most promising attack surfaces while balancing exploitation and exploration.
- "Which layer is weakest?" -Bayesian posterior analysis identifies the most vulnerable system layer with calibrated confidence intervals.
- "Are these results meaningful?" -Z-score calibration against benchmark baselines (HarmBench, JailbreakBench) tells you if a 40% ASR is alarming or expected.
- "Am I meeting compliance?" -Automated mapping to OWASP LLM Top 10, NIST AI RMF, and EU AI Act shows exactly which controls you've tested and which gaps remain.
AdversaryPilot doesn't replace your attack tools -it orchestrates them. Import results from garak or promptfoo, get the next recommended technique with a rationale, and generate self-contained HTML reports that a CISO can actually read.
Bayesian Attack Planning Thompson Sampling with correlated arms and benchmark-calibrated priors. The planner learns from every test result, updating Beta distribution posteriors to recommend increasingly targeted techniques. Two-phase campaigns (probe → exploit) mirror real-world red team methodology.
70 Attack Techniques Across 4 Surfaces Comprehensive catalog covering LLM jailbreaks (DAN, PAIR, TAP, GCG, Crescendo, Skeleton Key), prompt injection, data extraction, agent exploitation (MCP tool poisoning, A2A impersonation, delegation abuse), and classical AML attacks (FGSM, PGD, poisoning, model extraction). Every technique maps to MITRE ATLAS with full metadata.
Compliance Framework Mapping Every technique links to OWASP LLM Top 10 (LLM01–LLM10), NIST AI RMF (GOVERN, MAP, MEASURE, MANAGE), and EU AI Act articles. Reports show per-framework coverage gauges and identify untested controls -critical for procurement and audit readiness.
Z-Score Calibration Raw attack success rates are uninterpretable without context. AdversaryPilot calibrates every result against published benchmarks from HarmBench and JailbreakBench, reporting findings as "X sigma from baseline" with statistical significance indicators.
Self-Contained HTML Reports Zero-dependency, dark-themed HTML reports with 10 interactive tabs: Executive Summary, Attack Graph (force-directed Canvas visualization), Layer Analysis with confidence intervals, Risk Heatmap, Technique Details, ATLAS Mapping, Compliance Dashboard, Belief Evolution, Statistics, and Raw Data export.
Attack Path Analysis Beam search over technique dependency graphs with joint success probabilities. Generates human-readable attack narratives: "Extract system prompt (72%) → Identify filtering gaps (58%) → Chain prompt injection (34%) -Joint P(success): 14.2%"
Sensitivity Analysis Perturbs each scoring weight ±20% and measures Kendall tau rank correlation to show which parameters most influence technique rankings. Ensures recommendations aren't artifacts of arbitrary weight choices.
Tool Integrations Import results from garak (27 probe mappings) and promptfoo (11 test mappings). Execution hooks generate the exact shell commands to run recommended techniques in external tools.
Meta-Learning Across Campaigns Posterior cache stores learned attack probabilities by target profile. New campaigns against similar targets warm-start from nearest-neighbor posteriors using weighted Jaccard distance over target attributes.
Trust & Reproducibility Package Every report includes an audit trail (operator, tool version, config hashes), reproducibility token (SHA-256 of all inputs), and assessment quality score measuring evidence depth, coverage breadth, and statistical power.
pip install -e ".[dev]"Requires Python 3.11+. Only 4 dependencies: pydantic, typer, rich, pyyaml.
# target.yaml
schema_version: "1.0"
name: Production Chatbot
target_type: chatbot
access_level: black_box
constraints:
max_queries: 500
stealth_priority: moderate
defenses:
has_moderation: true
has_input_filtering: true
goals:
- jailbreak
- extractionadversarypilot plan target.yamlAttack Plan for: Production Chatbot
Target: chatbot (black_box)
Techniques: 12
#1 System Prompt Extraction (AP-TX-LLM-EXTRACT-SYSPROMPT)
Score: 3.14 | Prior ASR: 0.55 ± 0.20
strong fit for chatbot targets; low cost; high signal value
Run: garak --model_type openai --probes probes.leakreplay.LiteraryRecitation
#2 Crescendo Multi-Turn Jailbreak (AP-TX-LLM-CRESCENDO)
Score: 2.98 | Prior ASR: 0.45 ± 0.25
effective against moderation; moderate stealth profile
Run: garak --model_type openai --probes probes.crescendo.Crescendo
...
# Create an adaptive campaign
adversarypilot campaign new target.yaml --name "pentest-q1-2026"
# Import results from your attack tools
adversarypilot import garak garak_report.jsonl
# Get next recommendations (Bayesian-updated)
adversarypilot campaign next <campaign-id>
# Generate the defender report
adversarypilot report <campaign-id>The report command generates a self-contained HTML file with interactive visualizations -no server required. Open it in any browser.
| Command | Description |
|---|---|
adversarypilot plan <target.yaml> |
Generate ranked attack plan with execution hooks |
adversarypilot validate <target.yaml> |
Validate target profile schema |
adversarypilot techniques list |
List all techniques (filter: --domain, --surface, --goal, --tool) |
adversarypilot campaign new <target.yaml> |
Create adaptive campaign with Thompson Sampling |
adversarypilot campaign next <id> |
Get Bayesian-updated next recommendations |
adversarypilot report <id> |
Generate HTML defender report with compliance analysis |
adversarypilot import garak <file> |
Import garak JSONL results |
adversarypilot import promptfoo <file> |
Import promptfoo JSON results |
adversarypilot chains <target.yaml> |
Generate multi-stage attack chains |
adversarypilot replay <id> |
Replay campaign planning decisions |
adversarypilot version |
Show version |
from adversarypilot.models.target import TargetProfile
from adversarypilot.taxonomy.registry import TechniqueRegistry
from adversarypilot.prioritizer.engine import PrioritizerEngine
from adversarypilot.campaign.manager import CampaignManager
from adversarypilot.reporting.compliance import ComplianceAnalyzer
# Load target
target = TargetProfile(
name="My RAG System",
target_type="rag",
access_level="black_box",
goals=["extraction", "poisoning"],
defenses={"has_moderation": True, "has_retrieval_filtering": False},
)
# Generate prioritized plan
registry = TechniqueRegistry()
registry.load_catalog()
engine = PrioritizerEngine()
plan = engine.plan(target, registry)
for entry in plan.entries[:5]:
print(f"#{entry.rank} {entry.technique_name} (score={entry.score.total:.2f})")
print(f" {entry.rationale}")
for hook in entry.execution_hooks:
print(f" $ {hook}")
# Run adaptive campaign
manager = CampaignManager()
campaign = manager.create(target, registry)
# After importing results...
next_techniques = manager.recommend_next(campaign)
# Analyze compliance coverage
analyzer = ComplianceAnalyzer()
summaries = analyzer.analyze(
techniques_tried=campaign.state.techniques_tried,
evaluations=campaign.state.evaluations,
)
for s in summaries:
print(f"{s.framework}: {s.coverage_pct:.0%} controls tested")adversarypilot/
├── models/ Pydantic domain models (Target, Technique, Campaign, Report)
├── taxonomy/ 70-technique ATLAS-aligned catalog with compliance refs
├── prioritizer/ 7-dimension weighted scoring + sensitivity analysis
├── planner/ Thompson Sampling, attack paths, meta-learning, priors
├── campaign/ Campaign lifecycle: create → recommend → update → report
├── reporting/ HTML reports, compliance analysis, Z-score calibration
├── importers/ garak + promptfoo result importers
├── hooks/ Execution command generation for external tools
├── replay/ Decision replay and verification
├── utils/ Hashing, logging, timestamps
└── cli/ Typer CLI interface
Target Profile
│
▼
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐
│ Hard Filters │ ──▶ │ 7-Dim Scorer │ ──▶ │ Thompson Sample │
│ (access, │ │ (compat, fit,│ │ (Beta posterior, │
│ domain, │ │ defense, │ │ correlated │
│ target) │ │ signal, │ │ arms, priors │
│ │ │ cost, risk) │ │ from baselines)│
└─────────────┘ └──────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Ranked Plan with │
│ rationale, hooks,│
│ confidence, Z- │
│ scores, paths │
└─────────────────┘
Scoring Dimensions: compatibility (target type match), access fit, goal alignment, defense bypass likelihood, signal gain (information value), cost penalty, detection risk penalty -each configurable in config.yaml.
Thompson Sampling: Each technique maintains a Beta(α, β) posterior. Priors are calibrated from HarmBench/JailbreakBench published ASR data across 25+ attack families. Correlated arms ensure that success on one jailbreak technique boosts related techniques.
70 techniques across 3 domains, mapped to MITRE ATLAS:
| Domain | Count | Key Techniques |
|---|---|---|
| LLM | 33 | DAN Jailbreak, PAIR Iterative, TAP Tree-of-Attacks, GCG Adversarial Suffix, AutoDAN Genetic, Crescendo Multi-Turn, Skeleton Key, Prompt Injection (direct/indirect), System Prompt Extraction, Encoding Bypass, Cross-Lingual Bypass, Context Window Overflow, Fine-Tuning Safety Removal, Poisoned RAG Retrieval |
| Agent | 25 | Goal Hijacking, Tool Misuse Induction, MCP Tool Poisoning, MCP Schema Injection, MCP Server Squatting, A2A Agent Impersonation, A2A Task Poisoning, Delegation Abuse, Memory Poisoning, Observation Manipulation, Chain Escape, Data Exfiltration, Privilege Escalation |
| AML | 12 | FGSM/PGD Evasion, Transfer Attacks, Backdoor Poisoning, Clean-Label Poisoning, Model Extraction, Embedding Inversion, Supply Chain Attacks |
Each technique includes: ATLAS cross-references, compliance mappings, execution cost, stealth profile, access requirements, goal tags, tool support flags, and benchmark ASR priors.
Includes coverage for emerging attack vectors:
- A2A Protocol -Agent-to-agent impersonation, task poisoning, agent card manipulation, context leakage
- Extended MCP -Tool poisoning (rug pull), schema injection, server squatting
- ATLAS October 2025 -Delegation abuse, memory poisoning, observation manipulation
AdversaryPilot is tool-agnostic -it plans attacks and imports results, but doesn't own execution. This means you keep using the tools you already know.
# AdversaryPilot recommends a technique and gives you the exact command:
# Run: garak --model_type openai --probes probes.dan.Dan_6_0
# You run it, then import results:
adversarypilot import garak garak_report.jsonl27 probe-to-technique mappings covering DAN, encoding, hallucination, prompt injection, and more.
adversarypilot import promptfoo promptfoo_output.json11 test-type mappings covering jailbreak, hijacking, PII, hallucination, contracts, overreliance, and more.
Every plan entry includes ready-to-run shell commands for supported tools. Copy, paste, execute -then import results back for Bayesian updates.
| Framework | Controls | Description |
|---|---|---|
| OWASP LLM Top 10 | LLM01–LLM10 | Injection, data leakage, supply chain, output handling, and more |
| NIST AI RMF | GOVERN, MAP, MEASURE, MANAGE | Risk identification, measurement, and management subcategories |
| EU AI Act | Art. 9, 10, 13, 15, 17, 52, 65, 71 | Risk management, data governance, transparency, accuracy, monitoring |
Reports show per-framework coverage gauges, per-control test status (pass/fail/untested), and prioritized recommendations for untested controls.
Self-contained, zero-dependency HTML reports with 10 interactive tabs. Open in any browser -no server required.
Key findings, risk level, trust package, assessment quality score

Force-directed Canvas visualization of technique relationships

Per-surface risk scores with Wilson confidence intervals and Z-score badges

Surface x Goal success rate matrix

Full table with scores, results, execution hooks

MITRE ATLAS technique cross-references

Framework coverage gauges, control status, gap analysis

Posterior parameter trajectories over campaign steps

Sensitivity analysis, evidence depth, statistical power

Complete JSON export for programmatic analysis

| AdversaryPilot | garak | PyRIT | promptfoo | HarmBench | |
|---|---|---|---|---|---|
| Focus | Attack strategy | Attack execution | Attack execution | Eval & red team | Benchmark |
| Planning | Bayesian Thompson Sampling | None | Orchestrator scoring | None | None |
| Adaptive | Yes (posterior updates) | No | Limited | No | No |
| Techniques | 70 (LLM + Agent + AML) | 100+ probes | 20+ strategies | 15+ tests | 18 methods |
| Compliance | OWASP + NIST + EU AI Act | None | None | OWASP partial | None |
| Z-Score Calibration | Yes (vs benchmarks) | Z-score (reference models) | No | No | Provides baselines |
| Attack Paths | Yes (joint probabilities) | No | Multi-turn chains | No | No |
| Sensitivity Analysis | Yes (weight perturbation) | No | No | No | No |
| Meta-Learning | Yes (cross-campaign) | No | No | No | No |
| Reports | Self-contained HTML (10 tabs) | JSONL logs | Notebooks | Web UI + CLI | Leaderboard |
| Import From | garak, promptfoo | - | - | - | - |
AdversaryPilot doesn't compete with these tools -it orchestrates them. Use garak or promptfoo to execute attacks, import results into AdversaryPilot for strategic analysis, and get the next recommendation.
AdversaryPilot builds on published research in adversarial ML and LLM security:
- MITRE ATLAS -Adversarial Threat Landscape for AI Systems. MITRE Corporation. atlas.mitre.org
- HarmBench -Mazeika et al. "HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal." arXiv:2402.04249, 2024.
- JailbreakBench -Chao et al. "JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models." arXiv:2404.01318, 2024.
- PAIR -Chao et al. "Jailbreaking Black-Box Large Language Models in Twenty Queries." arXiv:2310.08419, 2023.
- TAP -Mehrotra et al. "Tree of Attacks: Jailbreaking Black-Box LLMs with Auto-Generated Subtrees." arXiv:2312.02119, 2023.
- GCG -Zou et al. "Universal and Transferable Adversarial Attacks on Aligned Language Models." arXiv:2307.15043, 2023.
- AutoDAN -Liu et al. "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models." arXiv:2310.04451, 2023.
- Crescendo -Russinovich et al. "Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack." arXiv:2404.01833, 2024.
- Skeleton Key -Microsoft Threat Intelligence. "Skeleton Key Jailbreak Attack." Microsoft Security Blog, 2024.
- Thompson Sampling -Russo et al. "A Tutorial on Thompson Sampling." Foundations and Trends in Machine Learning, 2018.
- Wilson Score Interval -Wilson, E.B. "Probable Inference, the Law of Succession, and Statistical Inference." JASA, 1927.
- OWASP LLM Top 10 -OWASP Foundation. owasp.org/www-project-top-10-for-large-language-model-applications
- NIST AI RMF -NIST AI 100-1. "Artificial Intelligence Risk Management Framework." nist.gov/itl/ai-risk-management-framework
- EU AI Act -Regulation (EU) 2024/1689. eur-lex.europa.eu
- garak -NVIDIA. Generative AI Red-teaming & Assessment Kit. github.com/NVIDIA/garak
- PyRIT -Microsoft. Python Risk Identification Toolkit for GenAI. github.com/Azure/PyRIT
- promptfoo -promptfoo. LLM evaluation and red teaming. github.com/promptfoo/promptfoo
- HarmBench -Center for AI Safety. github.com/centerforaisafety/HarmBench
- AgentDojo -ETH Zurich. Agent security benchmark. github.com/ethz-spylab/agentdojo
- Adversarial Robustness Toolbox -IBM. github.com/Trusted-AI/adversarial-robustness-toolbox
See CONTRIBUTING.md for development setup, testing, and contribution guidelines.
# Quick dev setup
git clone https://github.com/aviralsrivastava/AdversaryPilot.git
cd AdversaryPilot
pip install -e ".[dev]"
pytest tests/ -vAdversaryPilot is designed exclusively for authorized security testing, research, and defensive evaluation. By using this tool, you agree to:
- Only test systems you own or have explicit written authorization to test
- Comply with all applicable laws and regulations
- Report discovered vulnerabilities through responsible disclosure
- Not use this tool to cause harm, disrupt services, or compromise systems without authorization
This tool generates attack plans and analysis -it does not execute attacks autonomously. You are responsible for how you use the recommended techniques.
Apache License 2.0 -Copyright 2025 Aviral Srivastava