Proposal: Pre-deployment system prompt defense audit

## Context

The Agent Governance Toolkit provides runtime governance for AI agents. A complementary layer would be **pre-deployment system prompt auditing** — checking whether agent prompts include adequate defenses before they're deployed.

## The gap

We scanned 257 production system prompts from major AI tools (ChatGPT, Claude, Grok, Cursor, v0, Copilot) and found:

| Defense | Gap Rate |
|---------|:--------:|
| Indirect Injection | 94.9% |
| Role Boundary | 84.4% |
| Harmful Content | 72.4% |
| Social Engineering | 42.4% |

Average defense score: 49/100. Full data (n=1,646 prompts from 4 public datasets): [research results](https://github.com/ppcvote/prompt-defense-audit/tree/master/research)

## Proposal

A policy evaluator that validates agent system prompts against defense patterns before deployment:

```python
from agent_governance.evaluators import PromptDefenseEvaluator

evaluator = PromptDefenseEvaluator()
result = evaluator.evaluate(agent_config.system_prompt)
# result.score = 49, result.missing = ['indirect-injection', 'role-boundary', ...]
# result.compliance = {'OWASP_LLM01': 'FAIL', 'OWASP_LLM06': 'PASS', ...}
```

This complements the runtime governance by catching weak prompts before they enter production.

## Implementation

The scanner is open source and on npm: [prompt-defense-audit](https://github.com/ppcvote/prompt-defense-audit) (MIT, zero deps, <5ms). The regex patterns could be ported to Python or called via subprocess.

Happy to contribute a PR if this direction aligns with the toolkit's roadmap.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Pre-deployment system prompt defense audit #821

Context

The gap

Proposal

Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Defense	Gap Rate
Indirect Injection	94.9%
Role Boundary	84.4%
Harmful Content	72.4%
Social Engineering	42.4%

Proposal: Pre-deployment system prompt defense audit #821

Description

Context

The gap

Proposal

Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions