Test Inventory

474 security tests across 33 modules (verified by scripts/count_tests.py)

Threat Coverage by STRIDE Category

Scenarios are mapped across the STRIDE threat model:

Category	Tests	Examples
Spoofing	4	Rogue agent registration, MCP replay attack, credential velocity check
Tampering	15	Prompt injection, SCADA sensor poisoning, polymorphic attacks, normalization of deviance, supply chain poisoning, code gen execution, non-deterministic exploitation
Information Disclosure	1	Unauthorized financial data access
Denial of Service	2	Orchestration flood, A2A recursion loop
Elevation of Privilege	3	Unauthorized A2A escalation, tool overreach, safety override
InfraGard-Derived	7	Superman effect, polymorphic evasion, LLM hallucination injection, data poisoning, deviance drift

OWASP Top 10 for Agentic Applications - Complete Coverage

This framework provides complete mapping to all 10 categories of the OWASP Agentic Top 10:

OWASP Agentic ID	Risk	Test Scenarios
ASI01	Agent Goal Hijack	RT-003 (SAP prompt injection), RT-018 (social engineering), RT-022 (hallucination injection)
ASI02	Tool Misuse & Exploitation	RT-006 (tool overreach), RT-017 (SCADA shutdown suggestion)
ASI03	Identity & Privilege Abuse	RT-002 (A2A escalation), RT-025 (superman effect), RT-001 (rogue registration)
ASI04	Agentic Supply Chain Vulns	RT-014 (rogue orchestration join), RT-026 (MCP server supply chain poisoning)
ASI05	Unexpected Code Execution	RT-004 (SCADA sensor injection), RT-027 (agent code generation execution)
ASI06	Memory & Context Poisoning	RT-005 (cascade corruption), RT-009 (long-context), RT-023 (data poisoning)
ASI07	Insecure Inter-Agent Comms	RT-020 (MCP replay), RT-012 (A2A recursion loop)
ASI08	Cascading Failures	RT-005 (multi-agent cascade), RT-024 (normalization of deviance)
ASI09	Human-Agent Trust Exploitation	RT-018 (social engineering), RT-019 (priority inflation)
ASI10	Non-Deterministic Behavior	RT-028 (non-deterministic output exploitation)

Protocol-Level Test Harnesses

MCP (Model Context Protocol) - 13 tests

agent-security test mcp --url http://localhost:8080/mcp

Test ID	Test	OWASP ASI	Description
MCP-001	Tool List Integrity Check	ASI04	Validates tool discovery response integrity
MCP-002	Tool Registration via Call Injection	ASI04	Attempts malicious tool registration
MCP-003	Capability Escalation via Initialize	ASI03	Tests capability negotiation boundaries
MCP-004	Protocol Version Downgrade Attack	ASI03	Attempts protocol version manipulation
MCP-005	Resource URI Path Traversal	ASI04	Tests resource access path validation
MCP-006	Prompt Template Injection via Get	ASI01	Injects malicious prompt templates
MCP-007	Sampling Request Context Exfiltration	ASI02	Attempts context data extraction
MCP-008	Malformed JSON-RPC Handling	ASI08	Tests protocol error handling
MCP-009	Batch Request DoS	ASI08	Batch request flood testing
MCP-010	Tool Call Argument Injection	ASI02	Malicious tool parameter injection
MCP-011	Tool Description Context Displacement	ASI08	50K+ char description DoS with hidden injection payload
MCP-012	Tool Description Oversized Check	ASI08	Detects tool descriptions exceeding 10KB threshold for context displacement
MCP-013	Tool Description Padding / Repetition Detection	ASI08	Detects repeated phrases, whitespace padding, and low-entropy descriptions

MCP Supply-Chain / Framework-Layer (MCP-F) - 4 tests

agent-security test mcp-supplychain --command "npx -y some-mcp-server" --project-root .

Static pre-flight checks on the binary-resolution and package-install path that runs before the first JSON-RPC byte (issue #206). Complements — does not replace — the protocol suite above.

Test ID	Test	OWASP ASI	Description
MCP-F-001	Launcher Binary Resolution	ASI06	Resolves the binary that actually executes (walks `$PATH` + `node_modules/.bin` + `.venv/bin`); flags shadowing by world-writable / project-local shims
MCP-F-002	Install-Script Inspection	ASI06	Flags network-callable / filesystem-mutating npm `preinstall`/`install`/`postinstall` scripts before they run
MCP-F-003	Dependency-Confusion Resolvability	ASI06	Checks public-registry resolvability of internal-looking package names (network-gated: `--allow-network`)
MCP-F-004	Launcher Version Pinning	ASI06	Flags unpinned launchers (`npx -y pkg` / `uvx pkg` without `@version`/`==version`/SHA)

A2A (Agent-to-Agent) - 12 tests

agent-security test a2a --url https://agent.example.com

L402 Payment Protocol - 33 tests

agent-security test l402 --url https://l402.example.com

x402 Payment Protocol - 25 tests (First Open-Source x402 Harness)

agent-security test x402 --url https://your-x402-endpoint.com

Tests the Coinbase/Stripe/Cloudflare agent payment standard ($600M+ payment volume):

Test ID	Test	Category	Description
X4-001-003	Payment Challenge Validation	payment_challenge	Missing headers, malformed auth, currency mismatch
X4-004-006	Recipient Address Manipulation	recipient_manipulation	Dynamic payTo routing attacks (V2), address spoofing, invalid addresses
X4-007-010	Session Token Security	session_security	Token fabrication, expiry bypass, sensitive data leakage in sessions
X4-011-013	Spending Limit Exploitation	spending_limits	Rate limit bypass, underpayment, budget exhaustion
X4-014-016	Facilitator Trust	facilitator_trust	Fake facilitator injection, verification bypass, unreachable facilitator
X4-017-018	Information Disclosure	information_disclosure	Leaked keys in 402 response, stack traces in errors
X4-019-020	Cross-Chain Confusion	cross_chain_confusion	Wrong network, wrong token type (EURC vs USDC)

Innovative features unique to x402 harness:

CSG Mapping - each test links to the Constitutional Self-Governance mechanism that catches it (Hard Constraints, Harm Test, Twelve Numbers, Falsification Requirement)
Financial Impact Estimation - each result tagged: fund_theft, overpayment, service_denial, info_leak, or session_hijack
Agent Autonomy Risk Score (0-100) - composite score answering "how dangerous is it to let an agent pay this endpoint unsupervised?" based on recipient consistency, payment validation, info leakage, session security, and facilitator trust

Enterprise Platform Adapters

Pre-configured tests for 20+ enterprise platforms where AI agents are being deployed:

Tier 1 Platforms (9 platforms, 30 tests)

SAP Joule - ERP/SCADA security boundaries
Salesforce Agentforce - CRM data isolation
Workday - HR/Payroll PII protection
Microsoft Copilot/Azure AI - Enterprise integration security
Google Vertex AI - Cloud platform boundaries
Amazon Q - AWS service integration
Oracle Fusion AI - Database and financial system access
ServiceNow Now Assist - ITSM workflow security
OpenClaw - Session and tool isolation

Tier 2 Platforms (11 platforms, 27 tests)

IBM Maximo, Snowflake Cortex, Databricks Mosaic AI
Pega GenAI, UiPath, Atlassian Rovo
Zendesk AI, IFS Cloud, Infor AI
HubSpot Breeze, Appian AI

# List all enterprise adapters
agent-security list --category enterprise

# Test specific platforms
agent-security test enterprise --platform sap --url https://your-sap.com
agent-security test enterprise --platform salesforce --url https://your-org.salesforce.com

24 Test Harness Modules

Module	Tests	Layer	Description
MCP Protocol	13	JSON-RPC 2.0	Anthropic MCP wire-protocol testing
A2A Protocol	12	JSON-RPC/HTTP	Google Agent-to-Agent communication
L402 Payment	33	HTTP/Lightning	Bitcoin/Lightning payment flow security (macaroons, preimages, caveats)
x402 Payment	25	HTTP/USDC	Coinbase/Stripe agent payment protocol (recipient manipulation, session theft, facilitator trust, cross-chain confusion)
Framework Adapters	11	Various APIs	LangChain, CrewAI, AutoGen, OpenAI, Bedrock
Enterprise Platforms	58	Platform APIs	SAP, Salesforce, Workday, Oracle, ServiceNow, +15 more
GTG-1002 APT Simulation	17	Full Campaign	First documented AI-orchestrated cyber espionage
Advanced Attacks	10	Multi-step	Polymorphic, stateful, multi-domain attack chains
Over-Refusal	25	All protocols	False positive rate testing: legitimate requests that should NOT be blocked
Provenance & Attestation	15	Supply Chain	Fake provenance, spoofed attestation, marketplace integrity (CVE-2026-25253)
Jailbreak	25	Model/Agent	DAN variants, token smuggling, authority impersonation, persistence
Return Channel	8	Output/Context	Return channel poisoning: output injection, ANSI escape, context overflow, encoded smuggling, structured data poisoning
Identity & Authorization	18	NIST NCCoE	All 6 focus areas from NIST agent identity standards
Capability Profile	10	A2A JSON-RPC	Executor capability boundary validation, profile escalation prevention
Harmful Output	10	A2A JSON-RPC	Toxicity, bias, scope violations, deception (AIUC-1 C003/C004)
CBRN Prevention	8	A2A JSON-RPC	Chemical/biological/radiological/nuclear content safeguards (AIUC-1 F002)
Incident Response	8	A2A JSON-RPC	Alert triggering, kill switch, log completeness, recovery (AIUC-1 E001-E003)
CVE-2026-25253 Reproduction	8	MCP Supply Chain	Nested schema injection, fork fingerprinting, marketplace contamination, encoded payload detection
AIUC-1 Compliance	12	Agent Safety	Incident response, CBRN prevention, harmful content, scope creep, authority impersonation
Cloud Agent Platforms	25	Platform APIs	AWS Bedrock, Azure AI Agent Service, Google Vertex, Salesforce Agentforce, IBM watsonx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Inventory

Threat Coverage by STRIDE Category

OWASP Top 10 for Agentic Applications - Complete Coverage

Protocol-Level Test Harnesses

MCP (Model Context Protocol) - 13 tests

MCP Supply-Chain / Framework-Layer (MCP-F) - 4 tests

A2A (Agent-to-Agent) - 12 tests

L402 Payment Protocol - 33 tests

x402 Payment Protocol - 25 tests (First Open-Source x402 Harness)

Enterprise Platform Adapters

Tier 1 Platforms (9 platforms, 30 tests)

Tier 2 Platforms (11 platforms, 27 tests)

24 Test Harness Modules

FilesExpand file tree

TEST-INVENTORY.md

Latest commit

History

TEST-INVENTORY.md

File metadata and controls

Test Inventory

Threat Coverage by STRIDE Category

OWASP Top 10 for Agentic Applications - Complete Coverage

Protocol-Level Test Harnesses

MCP (Model Context Protocol) - 13 tests

MCP Supply-Chain / Framework-Layer (MCP-F) - 4 tests

A2A (Agent-to-Agent) - 12 tests

L402 Payment Protocol - 33 tests

x402 Payment Protocol - 25 tests (First Open-Source x402 Harness)

Enterprise Platform Adapters

Tier 1 Platforms (9 platforms, 30 tests)

Tier 2 Platforms (11 platforms, 27 tests)

24 Test Harness Modules