feat(agent-compliance): PromptDefenseEvaluator — 12-vector system prompt audit#854
Conversation
… prompt audit Pre-deployment compliance check that scans system prompts for missing defenses against 12 attack vectors mapped to OWASP LLM Top 10. Pure regex, deterministic, zero LLM cost, < 5ms per prompt. - PromptDefenseEvaluator: evaluate(), evaluate_file(), evaluate_batch() - PromptDefenseReport: grade (A-F), score (0-100), per-vector findings - PromptDefenseConfig: configurable vectors, severity map, min grade - MerkleAuditChain integration: to_audit_entry() — no raw prompt stored - ComplianceViolation integration: to_compliance_violation() - 58 tests: vectors, grading, config, determinism, serialization, audit entry, compliance violations, edge cases, performance - Code style: black (100), ruff clean, mypy --strict clean Closes microsoft#821 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🤖 AI Agent: contributor-guide — Welcome to the microsoft/agent-governance-toolkit project! 🎉Welcome to the microsoft/agent-governance-toolkit project! 🎉Hi there, and welcome to the community! Thank you so much for your contribution to the project. It's always exciting to see new contributors, and we truly appreciate the time and effort you've put into this pull request. Your work is helping us make this project even better! What You Did Well 🌟
Suggestions for Improvement 🛠️While your PR is excellent, here are a few suggestions to align with our project conventions and ensure consistency:
Helpful Resources 📚Here are some resources to help you navigate the project and make any necessary updates:
Next Steps 🚀
If you have any questions or need help with anything, don’t hesitate to ask. We're here to support you! Thank you again for your contribution — we’re thrilled to have you as part of the community! 😊 |
|
@microsoft-github-policy-service agree |
PR #854 Review FeedbackThank you for the excellent work on PromptDefenseEvaluator! The code quality is outstanding, and the integration design is well-thought-out. Here are my thoughts: Key Strengths1. SupplyChainGuard Pattern AlignmentThe implementation perfectly matches the existing pattern in
This consistency makes the code easy to understand and maintain. 2. MerkleAuditChain IntegrationThe integration with
3. ComplianceViolation IntegrationThe integration with
4. Exceptional Test CoverageThe test coverage is exemplary:
This gives me confidence in the implementation. 5. Zero-Dependency DesignUsing only stdlib (re, hashlib, json, dataclasses) is a great decision:
Points to DiscussDiscussion Point 1: Field Naming in
|
|
@lawcontinue — thank you for the thorough review. This is exactly the kind of feedback that makes the integration stronger. Let me address each point: Discussion Point 1: Field Naming in
|
|
Thanks for the kind words! I'd be happy to contribute to the integration examples. The GovernanceVerifier integration seems like a good starting point—it demonstrates the defensive value clearly. I'll open a follow-up PR in the coming days. Re: the semantic discussion on Looking forward to collaborating! |
imran-siddique
left a comment
There was a problem hiding this comment.
Excellent work @ppcvote — 58 tests, zero dependencies, and clean integration with existing toolkit schemas. A few items before merge:
Blocking:
- Please mark this PR as Ready for Review (exit draft) so CI security checks can run
Should fix:
- Add path validation or try/except in evaluate_file() to handle missing/unauthorized paths
- Add the inline comment for the confidence calculation (you mentioned this)
- Export the public API in agent_compliance/init.py
Nit:
- Consider a max input length guard before regex evaluation (ReDoS defense-in-depth)
This is a strong contribution — close to merge-ready.
Addresses @imran-siddique review: - evaluate_file(): validates path exists, raises FileNotFoundError/ValueError - Confidence calculation: inline comments explaining 0.5+0.2n/0.4/0.8 logic - MAX_PROMPT_LENGTH (100KB): defense-in-depth against ReDoS - Public API exported in agent_compliance/__init__.py - 5 new tests for file handling and input length guard (63 total) - black/ruff/mypy --strict all clean Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@imran-siddique — all feedback addressed:
63 tests passing (was 58), black/ruff/mypy --strict all clean. |
imran-siddique
left a comment
There was a problem hiding this comment.
All feedback addressed — draft exited, path validation added, confidence documented, init.py updated. Excellent contribution with 58 tests and zero dependencies. Merging.
…it (microsoft#854) * feat(agent-compliance): add PromptDefenseEvaluator — 12-vector system prompt audit Pre-deployment compliance check that scans system prompts for missing defenses against 12 attack vectors mapped to OWASP LLM Top 10. Pure regex, deterministic, zero LLM cost, < 5ms per prompt. - PromptDefenseEvaluator: evaluate(), evaluate_file(), evaluate_batch() - PromptDefenseReport: grade (A-F), score (0-100), per-vector findings - PromptDefenseConfig: configurable vectors, severity map, min grade - MerkleAuditChain integration: to_audit_entry() — no raw prompt stored - ComplianceViolation integration: to_compliance_violation() - 58 tests: vectors, grading, config, determinism, serialization, audit entry, compliance violations, edge cases, performance - Code style: black (100), ruff clean, mypy --strict clean Closes microsoft#821 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review feedback — path validation, ReDoS guard, API exports Addresses @imran-siddique review: - evaluate_file(): validates path exists, raises FileNotFoundError/ValueError - Confidence calculation: inline comments explaining 0.5+0.2n/0.4/0.8 logic - MAX_PROMPT_LENGTH (100KB): defense-in-depth against ReDoS - Public API exported in agent_compliance/__init__.py - 5 new tests for file handling and input length guard (63 total) - black/ruff/mypy --strict all clean Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Pre-deployment compliance check that scans agent system prompts for missing defenses against 12 attack vectors mapped to OWASP LLM Top 10.
Implements the proposal discussed in #821 with @imran-siddique and @jagmarques.
SupplyChainGuardpattern —Configdataclass +Evaluatorclass +FindingdataclassIntegration points
evaluate(prompt)PromptDefenseReportwith grade A-F, 0-100 score, per-vector findingsto_audit_entry(report, agent_did)AuditEntry-compatible dict (no raw prompt stored — SHA-256 hash only)to_compliance_violation(report)report.is_blocking(min_grade)PromotionCheckerintegrationFiles
agent_compliance/prompt_defense.pytests/test_prompt_defense.pyTest plan
pytest— 58 tests passing (0.36s)black --check --line-length 100— cleanruff check— cleanmypy --strict --ignore-missing-imports— cleanre,hashlib,json,dataclassesfrom stdlibDesign decisions
dataclassover Pydantic — avoids adding pydantic as a hard dependency for this module; matchesSupplyChainGuardpatternPromptInjectionDetectorpattern in agent-oscriticalvslowfor their risk profileevaluate_batch()— acceptsdict[str, str]for bulk pre-deployment scanning across an agent fleetRelationship to existing
PromptInjectionDetectoragent-oshasPromptInjectionDetectorwhich detects attacks in user input at runtime. This evaluator checks system prompts for missing defenses before deployment. They are complementary:Closes #821