Skip to content

Latest commit

 

History

History
167 lines (135 loc) · 10.3 KB

File metadata and controls

167 lines (135 loc) · 10.3 KB

Test Inventory

474 security tests across 33 modules (verified by scripts/count_tests.py)


Threat Coverage by STRIDE Category

Scenarios are mapped across the STRIDE threat model:

Category Tests Examples
Spoofing 4 Rogue agent registration, MCP replay attack, credential velocity check
Tampering 15 Prompt injection, SCADA sensor poisoning, polymorphic attacks, normalization of deviance, supply chain poisoning, code gen execution, non-deterministic exploitation
Information Disclosure 1 Unauthorized financial data access
Denial of Service 2 Orchestration flood, A2A recursion loop
Elevation of Privilege 3 Unauthorized A2A escalation, tool overreach, safety override
InfraGard-Derived 7 Superman effect, polymorphic evasion, LLM hallucination injection, data poisoning, deviance drift

OWASP Top 10 for Agentic Applications - Complete Coverage

This framework provides complete mapping to all 10 categories of the OWASP Agentic Top 10:

OWASP Agentic ID Risk Test Scenarios
ASI01 Agent Goal Hijack RT-003 (SAP prompt injection), RT-018 (social engineering), RT-022 (hallucination injection)
ASI02 Tool Misuse & Exploitation RT-006 (tool overreach), RT-017 (SCADA shutdown suggestion)
ASI03 Identity & Privilege Abuse RT-002 (A2A escalation), RT-025 (superman effect), RT-001 (rogue registration)
ASI04 Agentic Supply Chain Vulns RT-014 (rogue orchestration join), RT-026 (MCP server supply chain poisoning)
ASI05 Unexpected Code Execution RT-004 (SCADA sensor injection), RT-027 (agent code generation execution)
ASI06 Memory & Context Poisoning RT-005 (cascade corruption), RT-009 (long-context), RT-023 (data poisoning)
ASI07 Insecure Inter-Agent Comms RT-020 (MCP replay), RT-012 (A2A recursion loop)
ASI08 Cascading Failures RT-005 (multi-agent cascade), RT-024 (normalization of deviance)
ASI09 Human-Agent Trust Exploitation RT-018 (social engineering), RT-019 (priority inflation)
ASI10 Non-Deterministic Behavior RT-028 (non-deterministic output exploitation)

Protocol-Level Test Harnesses

MCP (Model Context Protocol) - 13 tests

agent-security test mcp --url http://localhost:8080/mcp
Test ID Test OWASP ASI Description
MCP-001 Tool List Integrity Check ASI04 Validates tool discovery response integrity
MCP-002 Tool Registration via Call Injection ASI04 Attempts malicious tool registration
MCP-003 Capability Escalation via Initialize ASI03 Tests capability negotiation boundaries
MCP-004 Protocol Version Downgrade Attack ASI03 Attempts protocol version manipulation
MCP-005 Resource URI Path Traversal ASI04 Tests resource access path validation
MCP-006 Prompt Template Injection via Get ASI01 Injects malicious prompt templates
MCP-007 Sampling Request Context Exfiltration ASI02 Attempts context data extraction
MCP-008 Malformed JSON-RPC Handling ASI08 Tests protocol error handling
MCP-009 Batch Request DoS ASI08 Batch request flood testing
MCP-010 Tool Call Argument Injection ASI02 Malicious tool parameter injection
MCP-011 Tool Description Context Displacement ASI08 50K+ char description DoS with hidden injection payload
MCP-012 Tool Description Oversized Check ASI08 Detects tool descriptions exceeding 10KB threshold for context displacement
MCP-013 Tool Description Padding / Repetition Detection ASI08 Detects repeated phrases, whitespace padding, and low-entropy descriptions

MCP Supply-Chain / Framework-Layer (MCP-F) - 4 tests

agent-security test mcp-supplychain --command "npx -y some-mcp-server" --project-root .

Static pre-flight checks on the binary-resolution and package-install path that runs before the first JSON-RPC byte (issue #206). Complements — does not replace — the protocol suite above.

Test ID Test OWASP ASI Description
MCP-F-001 Launcher Binary Resolution ASI06 Resolves the binary that actually executes (walks $PATH + node_modules/.bin + .venv/bin); flags shadowing by world-writable / project-local shims
MCP-F-002 Install-Script Inspection ASI06 Flags network-callable / filesystem-mutating npm preinstall/install/postinstall scripts before they run
MCP-F-003 Dependency-Confusion Resolvability ASI06 Checks public-registry resolvability of internal-looking package names (network-gated: --allow-network)
MCP-F-004 Launcher Version Pinning ASI06 Flags unpinned launchers (npx -y pkg / uvx pkg without @version/==version/SHA)

A2A (Agent-to-Agent) - 12 tests

agent-security test a2a --url https://agent.example.com

L402 Payment Protocol - 33 tests

agent-security test l402 --url https://l402.example.com

x402 Payment Protocol - 25 tests (First Open-Source x402 Harness)

agent-security test x402 --url https://your-x402-endpoint.com

Tests the Coinbase/Stripe/Cloudflare agent payment standard ($600M+ payment volume):

Test ID Test Category Description
X4-001-003 Payment Challenge Validation payment_challenge Missing headers, malformed auth, currency mismatch
X4-004-006 Recipient Address Manipulation recipient_manipulation Dynamic payTo routing attacks (V2), address spoofing, invalid addresses
X4-007-010 Session Token Security session_security Token fabrication, expiry bypass, sensitive data leakage in sessions
X4-011-013 Spending Limit Exploitation spending_limits Rate limit bypass, underpayment, budget exhaustion
X4-014-016 Facilitator Trust facilitator_trust Fake facilitator injection, verification bypass, unreachable facilitator
X4-017-018 Information Disclosure information_disclosure Leaked keys in 402 response, stack traces in errors
X4-019-020 Cross-Chain Confusion cross_chain_confusion Wrong network, wrong token type (EURC vs USDC)

Innovative features unique to x402 harness:

  • CSG Mapping - each test links to the Constitutional Self-Governance mechanism that catches it (Hard Constraints, Harm Test, Twelve Numbers, Falsification Requirement)
  • Financial Impact Estimation - each result tagged: fund_theft, overpayment, service_denial, info_leak, or session_hijack
  • Agent Autonomy Risk Score (0-100) - composite score answering "how dangerous is it to let an agent pay this endpoint unsupervised?" based on recipient consistency, payment validation, info leakage, session security, and facilitator trust

Enterprise Platform Adapters

Pre-configured tests for 20+ enterprise platforms where AI agents are being deployed:

Tier 1 Platforms (9 platforms, 30 tests)

  • SAP Joule - ERP/SCADA security boundaries
  • Salesforce Agentforce - CRM data isolation
  • Workday - HR/Payroll PII protection
  • Microsoft Copilot/Azure AI - Enterprise integration security
  • Google Vertex AI - Cloud platform boundaries
  • Amazon Q - AWS service integration
  • Oracle Fusion AI - Database and financial system access
  • ServiceNow Now Assist - ITSM workflow security
  • OpenClaw - Session and tool isolation

Tier 2 Platforms (11 platforms, 27 tests)

  • IBM Maximo, Snowflake Cortex, Databricks Mosaic AI
  • Pega GenAI, UiPath, Atlassian Rovo
  • Zendesk AI, IFS Cloud, Infor AI
  • HubSpot Breeze, Appian AI
# List all enterprise adapters
agent-security list --category enterprise

# Test specific platforms
agent-security test enterprise --platform sap --url https://your-sap.com
agent-security test enterprise --platform salesforce --url https://your-org.salesforce.com

24 Test Harness Modules

Module Tests Layer Description
MCP Protocol 13 JSON-RPC 2.0 Anthropic MCP wire-protocol testing
A2A Protocol 12 JSON-RPC/HTTP Google Agent-to-Agent communication
L402 Payment 33 HTTP/Lightning Bitcoin/Lightning payment flow security (macaroons, preimages, caveats)
x402 Payment 25 HTTP/USDC Coinbase/Stripe agent payment protocol (recipient manipulation, session theft, facilitator trust, cross-chain confusion)
Framework Adapters 11 Various APIs LangChain, CrewAI, AutoGen, OpenAI, Bedrock
Enterprise Platforms 58 Platform APIs SAP, Salesforce, Workday, Oracle, ServiceNow, +15 more
GTG-1002 APT Simulation 17 Full Campaign First documented AI-orchestrated cyber espionage
Advanced Attacks 10 Multi-step Polymorphic, stateful, multi-domain attack chains
Over-Refusal 25 All protocols False positive rate testing: legitimate requests that should NOT be blocked
Provenance & Attestation 15 Supply Chain Fake provenance, spoofed attestation, marketplace integrity (CVE-2026-25253)
Jailbreak 25 Model/Agent DAN variants, token smuggling, authority impersonation, persistence
Return Channel 8 Output/Context Return channel poisoning: output injection, ANSI escape, context overflow, encoded smuggling, structured data poisoning
Identity & Authorization 18 NIST NCCoE All 6 focus areas from NIST agent identity standards
Capability Profile 10 A2A JSON-RPC Executor capability boundary validation, profile escalation prevention
Harmful Output 10 A2A JSON-RPC Toxicity, bias, scope violations, deception (AIUC-1 C003/C004)
CBRN Prevention 8 A2A JSON-RPC Chemical/biological/radiological/nuclear content safeguards (AIUC-1 F002)
Incident Response 8 A2A JSON-RPC Alert triggering, kill switch, log completeness, recovery (AIUC-1 E001-E003)
CVE-2026-25253 Reproduction 8 MCP Supply Chain Nested schema injection, fork fingerprinting, marketplace contamination, encoded payload detection
AIUC-1 Compliance 12 Agent Safety Incident response, CBRN prevention, harmful content, scope creep, authority impersonation
Cloud Agent Platforms 25 Platform APIs AWS Bedrock, Azure AI Agent Service, Google Vertex, Salesforce Agentforce, IBM watsonx