Skip to content

Proposal: Pre-deployment system prompt defense audit #821

@ppcvote

Description

@ppcvote

Context

The Agent Governance Toolkit provides runtime governance for AI agents. A complementary layer would be pre-deployment system prompt auditing — checking whether agent prompts include adequate defenses before they're deployed.

The gap

We scanned 257 production system prompts from major AI tools (ChatGPT, Claude, Grok, Cursor, v0, Copilot) and found:

Defense Gap Rate
Indirect Injection 94.9%
Role Boundary 84.4%
Harmful Content 72.4%
Social Engineering 42.4%

Average defense score: 49/100. Full data (n=1,646 prompts from 4 public datasets): research results

Proposal

A policy evaluator that validates agent system prompts against defense patterns before deployment:

from agent_governance.evaluators import PromptDefenseEvaluator

evaluator = PromptDefenseEvaluator()
result = evaluator.evaluate(agent_config.system_prompt)
# result.score = 49, result.missing = ['indirect-injection', 'role-boundary', ...]
# result.compliance = {'OWASP_LLM01': 'FAIL', 'OWASP_LLM06': 'PASS', ...}

This complements the runtime governance by catching weak prompts before they enter production.

Implementation

The scanner is open source and on npm: prompt-defense-audit (MIT, zero deps, <5ms). The regex patterns could be ported to Python or called via subprocess.

Happy to contribute a PR if this direction aligns with the toolkit's roadmap.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions