Skip to content

feat(plugins): Regex-based agent threat detection plugin (ATR) #4108

@eeee2345

Description

@eeee2345

Problem

MCP Gateway already has excellent security plugins (secrets_detection, encoded_exfil_detection), but none that detect agent-specific threats like prompt injection, tool poisoning, jailbreak, or cross-agent manipulation in MCP payloads.

Proposed solution

An ATR (Agent Threat Rules) plugin that scans prompts, tool invocations, tool results, and resources using community-maintained regex rules.

Hooks: prompt_pre_fetch, tool_pre_invoke, tool_post_invoke, resource_post_fetch

# plugins/config.yaml
- name: "ATRThreatDetection"
  kind: "plugins.atr_threat_detection.atr_threat_detection.ATRThreatDetectionPlugin"
  hooks: ["prompt_pre_fetch", "tool_pre_invoke", "tool_post_invoke", "resource_post_fetch"]
  mode: "disabled"
  priority: 53
  config:
    block_on_detection: true
    min_severity: "medium"

Key characteristics:

  • 20 bundled ATR rules covering OWASP Agentic Top 10
  • Pure regex, no external API calls, <5ms per scan
  • Configurable blocking and severity threshold
  • Complements secrets_detection (which detects leaked secrets in output) by detecting adversarial input patterns
  • Rules maintained at agentthreatrule.org (MIT licensed)
  • Already adopted by Cisco AI Defense

Willingness to contribute

Yes — I have a ready implementation with unit tests covering all 4 hooks. Happy to submit a PR once direction is confirmed.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions