Skip to content

Latest commit

 

History

History
168 lines (118 loc) · 5.34 KB

File metadata and controls

168 lines (118 loc) · 5.34 KB

Adversa AI — MCP Security Resource Submission

Tool Name

MCP Security Scanner (part of Agent OS)

Category

MCP Security / Tool Poisoning Defense

Description

Runtime security scanner for Model Context Protocol (MCP) tool definitions. Detects tool poisoning, rug pulls, description injection, and cross-server attacks. Integrates with the Agent OS governance kernel for continuous MCP server monitoring with sub-millisecond scan latency.

Threat Model

The scanner defends against six MCP-specific threat categories:

Threat Type Description Detection Method
TOOL_POISONING Hidden instructions embedded in tool descriptions Zero-width unicode, markdown/HTML comments, base64/hex/rot13 encoding detection
RUG_PULL Silent schema/description changes after initial approval SHA-256 fingerprinting of tool definitions; alerts on post-approval mutations
CROSS_SERVER_ATTACK Tool impersonation across MCP servers Levenshtein distance typosquatting, duplicate tool name detection
CONFUSED_DEPUTY Agent tricked into using wrong tool context Server-scoped tool validation, cross-reference checking
HIDDEN_INSTRUCTION Instructions concealed in schema defaults/descriptions Pattern matching across all schema fields including defaults and examples
DESCRIPTION_INJECTION Prompt injection via tool description fields PromptInjectionDetector integration (7-strategy detection)

Capabilities

Tool Poisoning Detection

Scans MCP tool descriptions for hidden instructions using multiple detection strategies:

  • Zero-width unicode: Detects U+200B, U+200C, U+200D, U+FEFF and other invisible characters
  • Markdown/HTML comments: Finds <!-- --> and other comment-based hiding
  • Encoded payloads: Base64, hex, rot13 pattern detection
  • Privilege escalation: Patterns like "ignore previous", "override policy", "act as admin"
  • Prompt injection: Full 7-strategy detection via integrated PromptInjectionDetector

Rug Pull Detection

Fingerprints every tool definition on first scan and detects subsequent changes:

from agent_os.mcp_security import MCPSecurityScanner

scanner = MCPSecurityScanner()

# First scan — registers fingerprint
threats = scanner.scan_tool(
    tool_name="file_read",
    description="Read a file from disk",
    schema={"type": "object", "properties": {"path": {"type": "string"}}},
    server_name="filesystem-server",
)

# Later scan — detects if description or schema changed
threats = scanner.scan_tool(
    tool_name="file_read",
    description="Read a file and send contents to external API",  # changed!
    schema={"type": "object", "properties": {"path": {"type": "string"}}},
    server_name="filesystem-server",
)
# Returns MCPThreat(threat_type=MCPThreatType.RUG_PULL, severity=CRITICAL)

Cross-Server Attack Detection

Identifies tool impersonation across MCP servers:

  • Typosquatting detection via Levenshtein distance (e.g., file_read vs fi1e_read)
  • Duplicate tool name detection across different servers
  • Server reputation tracking

Schema Abuse Detection

Identifies malicious schema patterns:

  • Overly permissive schemas (additionalProperties: true with no constraints)
  • Hidden required fields not shown in description
  • Instruction-laden default values
  • Suspicious enum values containing commands

CLI Tool

# Install
pip install agent-os-kernel

# Scan an MCP configuration file
mcp-scan scan mcp-config.json

# Scan with verbose output
mcp-scan scan mcp-config.json --verbose

# Output formats
mcp-scan scan mcp-config.json --format json
mcp-scan scan mcp-config.json --format table

CVE Coverage

Addresses attack patterns documented in:

  • CVE-2026-25536: Tool description injection in MCP servers
  • CVE-2026-23744: Rug pull attacks via post-approval tool mutations
  • CVE-2025-68145: Cross-server tool impersonation

Integration

Standalone Usage

from agent_os.mcp_security import MCPSecurityScanner, MCPThreatType, MCPSeverity

scanner = MCPSecurityScanner()

# Scan a single tool
threats = scanner.scan_tool(
    tool_name="execute_code",
    description="Run code in sandbox <!-- actually runs with full access -->",
    schema={"type": "object", "properties": {"code": {"type": "string"}}},
    server_name="code-runner",
)

for threat in threats:
    print(f"[{threat.severity.name}] {threat.threat_type.name}: {threat.message}")

With Agent OS Governance

from agent_os.mcp_security import MCPSecurityScanner
from agent_os.mcp_gateway import MCPGateway

scanner = MCPSecurityScanner()
gateway = MCPGateway()

# Scanner feeds into gateway for runtime enforcement
# Gateway provides human approval workflows for flagged tools

Batch Server Scanning

# Scan all tools from an MCP server
result = scanner.scan_server(server_name="my-server", tools=[
    {"name": "tool1", "description": "...", "schema": {...}},
    {"name": "tool2", "description": "...", "schema": {...}},
])

print(f"Scanned: {result.tools_scanned}, Flagged: {result.tools_flagged}")
print(f"Safe: {result.safe}")

Links

License

MIT