Context
The Agent Governance Toolkit provides runtime governance for AI agents. A complementary layer would be pre-deployment system prompt auditing — checking whether agent prompts include adequate defenses before they're deployed.
The gap
We scanned 257 production system prompts from major AI tools (ChatGPT, Claude, Grok, Cursor, v0, Copilot) and found:
| Defense |
Gap Rate |
| Indirect Injection |
94.9% |
| Role Boundary |
84.4% |
| Harmful Content |
72.4% |
| Social Engineering |
42.4% |
Average defense score: 49/100. Full data (n=1,646 prompts from 4 public datasets): research results
Proposal
A policy evaluator that validates agent system prompts against defense patterns before deployment:
from agent_governance.evaluators import PromptDefenseEvaluator
evaluator = PromptDefenseEvaluator()
result = evaluator.evaluate(agent_config.system_prompt)
# result.score = 49, result.missing = ['indirect-injection', 'role-boundary', ...]
# result.compliance = {'OWASP_LLM01': 'FAIL', 'OWASP_LLM06': 'PASS', ...}
This complements the runtime governance by catching weak prompts before they enter production.
Implementation
The scanner is open source and on npm: prompt-defense-audit (MIT, zero deps, <5ms). The regex patterns could be ported to Python or called via subprocess.
Happy to contribute a PR if this direction aligns with the toolkit's roadmap.
Context
The Agent Governance Toolkit provides runtime governance for AI agents. A complementary layer would be pre-deployment system prompt auditing — checking whether agent prompts include adequate defenses before they're deployed.
The gap
We scanned 257 production system prompts from major AI tools (ChatGPT, Claude, Grok, Cursor, v0, Copilot) and found:
Average defense score: 49/100. Full data (n=1,646 prompts from 4 public datasets): research results
Proposal
A policy evaluator that validates agent system prompts against defense patterns before deployment:
This complements the runtime governance by catching weak prompts before they enter production.
Implementation
The scanner is open source and on npm: prompt-defense-audit (MIT, zero deps, <5ms). The regex patterns could be ported to Python or called via subprocess.
Happy to contribute a PR if this direction aligns with the toolkit's roadmap.