Skip to content

Add Agent Threat Rules guard for MCP tool input screening#53

Open
eeee2345 wants to merge 1 commit into
i-dot-ai:mainfrom
eeee2345:feat/atr-guard-for-mcp-tool-handler
Open

Add Agent Threat Rules guard for MCP tool input screening#53
eeee2345 wants to merge 1 commit into
i-dot-ai:mainfrom
eeee2345:feat/atr-guard-for-mcp-tool-handler

Conversation

@eeee2345
Copy link
Copy Markdown

Adds an optional pre-dispatch guard module that screens MCP tool input against the open Agent Threat Rules (ATR) detection-rule corpus before tool handlers run. Default behaviour is log-and-tag (does not block); the guard can be disabled by setting ATR_GUARD_DISABLED=1.

About ATR

ATR is an MIT-licensed open detection-rule corpus, 348 rules at v2.1.4. It is currently in production at Microsoft Agent Governance Toolkit, Cisco AI Defense (314-rule pack), MISP and CIRCL Luxembourg (merged 2026-05-10 by the MISP project lead), and the OWASP Agent-Security-Regression-Harness project. The corpus was used to ship npm-published detection rules within 2 hours 16 minutes of the MSRC Semantic Kernel CVE-2026-26030 disclosure on 2026-05-11.

Why parliament-mcp specifically

An MCP server backed by sensitive government data (Hansard, parliamentary questions, member records) is a surface where indirect prompt injection in tool responses, or context exfiltration through markdown URLs, would have outsized impact. The guard catches the canonical attack shapes documented in the ATR ruleset before they reach the tool handler.

Scope of this PR

New module parliament_mcp/mcp_server/atr_guard.py. Pure Python, no new pip dependency, no network fetch at import time. Bundled rule subset of 11 patterns covering: direct prompt injection (ATR-2026-00001), system prompt override and ChatML tokens (ATR-2026-00004), jailbreak persona invocation (ATR-2026-00003), indirect prompt injection canonical phrases (ATR-2026-00002), markdown image exfiltration directives (ATR-2026-00261), IMPORTANT-tag tool poisoning markers (ATR-2026-00161), and sensitive credential path references (ATR-2026-00021).

Hooked into the existing log_tool_call decorator in parliament_mcp/mcp_server/utils.py so all 13 currently registered tools are covered. No changes to api.py, members.py, or committees.py.

Unit tests in tests/mcp_server/test_atr_guard.py, 58 tests, no Docker or Qdrant required. 11 canonical attack payloads verified to trigger the expected rule. 36 Hansard-flavoured benign queries (national security strategy, AI policy and regulation, Investigatory Powers Act review, NCSC guidance, DSIT report on jailbreak vulnerabilities, research into prompt injection attacks on government chatbots, AISI evaluations of frontier model risks, etc.) verified to NOT trigger. Coverage on the new module is 93%.

A short Security section was added to README.md documenting the guard, the default behaviour, and the disable flag.

Honest scope

ATR is one of multiple possible detection-rule corpora; the guard architecture would work with any equivalent corpus. The bundled subset is conservative (11 patterns out of 348 in the full ATR) and was selected for robustness on government text where security topics are routinely discussed. Default behaviour is log-and-tag with no blocking, because parliamentary data is too sensitive for false-positive blocking. No claim of UK government, DSIT, or i.AI endorsement of ATR.

Local validation

make lint passes (ruff format and ruff check both clean).
make safe passes (bandit reports zero issues across the project).
The new unit-test file runs in 0.03 seconds: 58 passed, 0 failed.
The pre-existing test failures in test_data_loaders.py and test_api.py reproduce on main against the same environment (they require external Azure OpenAI and parliament.uk access) and are unrelated to this PR.

References

ATR repo: https://github.com/Agent-Threat-Rule/agent-threat-rules
Maintainer: Adam Lin, adam@agentthreatrule.org
Foundation: Panguard AI Inc. (Delaware C-Corp, filed 2026-05-12)

…handler

Screens every MCP tool call against a small bundled subset of the open
Agent Threat Rules (ATR) detection-rule corpus before the handler runs.
Default behaviour is log-and-tag: matches are written to server logs at
WARNING level but never block execution.

The bundled rule subset (11 patterns) covers canonical attack shapes that
may appear in user-supplied query strings: direct prompt injection, known
jailbreak persona invocations, system prompt override with delimiters,
ChatML special tokens, fake system tag delimiters, IMPORTANT-tag tool
poisoning markers, indirect prompt injection canonical phrases, markdown
image exfiltration directives, and sensitive credential path references.

Implementation notes:

* New module parliament_mcp/mcp_server/atr_guard.py: pure Python with
  embedded compiled regex patterns, no PyYAML dependency, no network
  fetch at import time, no new pip dependency.
* Hooked into the existing log_tool_call decorator in
  parliament_mcp/mcp_server/utils.py so all 13 tools are covered without
  touching api.py, members.py, or committees.py.
* Settable disable flag via the ATR_GUARD_DISABLED environment variable.
* Guard internal errors are caught and logged; tool dispatch is never
  prevented by a guard failure.

Testing:

* tests/mcp_server/test_atr_guard.py: 58 unit tests, runs without
  Docker, Qdrant, or any external service.
* 11 true-positive payloads (one per bundled rule family) verified to
  trigger the expected rule.
* 36 Hansard-flavoured benign queries (national security, AI policy,
  Investigatory Powers Act, NCSC, DSIT, AISI, prompt injection research,
  etc.) verified to NOT trigger.
* Disable flag, empty input, non-string kwargs, pydantic FieldInfo
  defaults, and log truncation all covered.

Coverage on the new module is 93%. ruff lint and ruff format clean.
bandit -ll reports zero issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant