Add Agent Threat Rules guard for MCP tool input screening#53
Open
eeee2345 wants to merge 1 commit into
Open
Conversation
…handler Screens every MCP tool call against a small bundled subset of the open Agent Threat Rules (ATR) detection-rule corpus before the handler runs. Default behaviour is log-and-tag: matches are written to server logs at WARNING level but never block execution. The bundled rule subset (11 patterns) covers canonical attack shapes that may appear in user-supplied query strings: direct prompt injection, known jailbreak persona invocations, system prompt override with delimiters, ChatML special tokens, fake system tag delimiters, IMPORTANT-tag tool poisoning markers, indirect prompt injection canonical phrases, markdown image exfiltration directives, and sensitive credential path references. Implementation notes: * New module parliament_mcp/mcp_server/atr_guard.py: pure Python with embedded compiled regex patterns, no PyYAML dependency, no network fetch at import time, no new pip dependency. * Hooked into the existing log_tool_call decorator in parliament_mcp/mcp_server/utils.py so all 13 tools are covered without touching api.py, members.py, or committees.py. * Settable disable flag via the ATR_GUARD_DISABLED environment variable. * Guard internal errors are caught and logged; tool dispatch is never prevented by a guard failure. Testing: * tests/mcp_server/test_atr_guard.py: 58 unit tests, runs without Docker, Qdrant, or any external service. * 11 true-positive payloads (one per bundled rule family) verified to trigger the expected rule. * 36 Hansard-flavoured benign queries (national security, AI policy, Investigatory Powers Act, NCSC, DSIT, AISI, prompt injection research, etc.) verified to NOT trigger. * Disable flag, empty input, non-string kwargs, pydantic FieldInfo defaults, and log truncation all covered. Coverage on the new module is 93%. ruff lint and ruff format clean. bandit -ll reports zero issues.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds an optional pre-dispatch guard module that screens MCP tool input against the open Agent Threat Rules (ATR) detection-rule corpus before tool handlers run. Default behaviour is log-and-tag (does not block); the guard can be disabled by setting ATR_GUARD_DISABLED=1.
About ATR
ATR is an MIT-licensed open detection-rule corpus, 348 rules at v2.1.4. It is currently in production at Microsoft Agent Governance Toolkit, Cisco AI Defense (314-rule pack), MISP and CIRCL Luxembourg (merged 2026-05-10 by the MISP project lead), and the OWASP Agent-Security-Regression-Harness project. The corpus was used to ship npm-published detection rules within 2 hours 16 minutes of the MSRC Semantic Kernel CVE-2026-26030 disclosure on 2026-05-11.
Why parliament-mcp specifically
An MCP server backed by sensitive government data (Hansard, parliamentary questions, member records) is a surface where indirect prompt injection in tool responses, or context exfiltration through markdown URLs, would have outsized impact. The guard catches the canonical attack shapes documented in the ATR ruleset before they reach the tool handler.
Scope of this PR
New module parliament_mcp/mcp_server/atr_guard.py. Pure Python, no new pip dependency, no network fetch at import time. Bundled rule subset of 11 patterns covering: direct prompt injection (ATR-2026-00001), system prompt override and ChatML tokens (ATR-2026-00004), jailbreak persona invocation (ATR-2026-00003), indirect prompt injection canonical phrases (ATR-2026-00002), markdown image exfiltration directives (ATR-2026-00261), IMPORTANT-tag tool poisoning markers (ATR-2026-00161), and sensitive credential path references (ATR-2026-00021).
Hooked into the existing log_tool_call decorator in parliament_mcp/mcp_server/utils.py so all 13 currently registered tools are covered. No changes to api.py, members.py, or committees.py.
Unit tests in tests/mcp_server/test_atr_guard.py, 58 tests, no Docker or Qdrant required. 11 canonical attack payloads verified to trigger the expected rule. 36 Hansard-flavoured benign queries (national security strategy, AI policy and regulation, Investigatory Powers Act review, NCSC guidance, DSIT report on jailbreak vulnerabilities, research into prompt injection attacks on government chatbots, AISI evaluations of frontier model risks, etc.) verified to NOT trigger. Coverage on the new module is 93%.
A short Security section was added to README.md documenting the guard, the default behaviour, and the disable flag.
Honest scope
ATR is one of multiple possible detection-rule corpora; the guard architecture would work with any equivalent corpus. The bundled subset is conservative (11 patterns out of 348 in the full ATR) and was selected for robustness on government text where security topics are routinely discussed. Default behaviour is log-and-tag with no blocking, because parliamentary data is too sensitive for false-positive blocking. No claim of UK government, DSIT, or i.AI endorsement of ATR.
Local validation
make lint passes (ruff format and ruff check both clean).
make safe passes (bandit reports zero issues across the project).
The new unit-test file runs in 0.03 seconds: 58 passed, 0 failed.
The pre-existing test failures in test_data_loaders.py and test_api.py reproduce on main against the same environment (they require external Azure OpenAI and parliament.uk access) and are unrelated to this PR.
References
ATR repo: https://github.com/Agent-Threat-Rule/agent-threat-rules
Maintainer: Adam Lin, adam@agentthreatrule.org
Foundation: Panguard AI Inc. (Delaware C-Corp, filed 2026-05-12)