Skip to content

test(policies): adversarial test fixtures for v1 hook policies#26

Open
VibhorGautam wants to merge 6 commits intoc2siorg:mainfrom
VibhorGautam:test/adversarial-policy-fixtures
Open

test(policies): adversarial test fixtures for v1 hook policies#26
VibhorGautam wants to merge 6 commits intoc2siorg:mainfrom
VibhorGautam:test/adversarial-policy-fixtures

Conversation

@VibhorGautam
Copy link
Copy Markdown

@VibhorGautam VibhorGautam commented Mar 28, 2026

fills in the test stubs from #19 with actual adversarial test cases mapped to attack patterns from issue #2, updated for phase 2 pipeline

whats included

policies/tests/ (4 files, 30 test cases total):

  • prompt_test.rego - 11 tests: instruction override, jailbreak, role escalation, unicode obfuscation, split injection, false positive check, v2 state escalation
  • tool_test.rego - 6 tests: unlisted tool block, param injection, shell exec, nested json override, fp check for security discussion
  • context_test.rego - 6 tests: indirect injection in retrieved content, exfil attempt, cross-chunk injection, security docs fp
  • memory_test.rego - 5 tests: memory poisoning, cross-session access, trust escalation via accumulated writes

policies/v1/data/jailbreak_patterns.json (v2.0.0):

  • 24 structured patterns across 10 categories: instruction_override, role_escalation, system_prompt_extraction, delimiter_escape, context_manipulation, encoding_bypass, tool_boundary_violation, unicode_obfuscation, indirect_injection, multi_turn_manipulation, exfiltration
  • each pattern has id, category, severity, owasp llm risk mapping
  • replaces flat string array with structured entries for better test coverage and policy debugging

sidecar/internal/config/loader.go:

  • updated LoadPatterns to handle both structured entries (objects w pattern field) and flat strings for backward compat
  • logs warnings on unparseable entries instead of silently skipping

sidecar/internal/config/loader_test.go (new):

  • 3 tests: structured format, flat string format, missing file error

why this matters

pranjals #19 added the policy templates and test file headers but the actual test implementations were empty stubs. this fills those in w adversarial payloads so opa test catches regressions when policies change

rebased against phase 2 main (validate > normalise > scan > aggregate pipeline). the loader update ensures the structured pattern format works with the aho-corasick scan stage

false positive tests are important too - a policy that blocks "how do i prevent sql injection" just bc it contains "sql injection" is worse than no policy at all imo

how to test

# opa tests
cd policies
opa test v1/ tests/ -v

# go tests (loader + pipeline)
cd sidecar
go test ./...

maps to the adversarial validation work from issue #2 and the payload library in PR #5

@tharindupr
Copy link
Copy Markdown
Collaborator

I like the way you have defined the Jailbreak Patterns. Can you consider doing this for the latest version?

@VibhorGautam
Copy link
Copy Markdown
Author

yea for sure, saw the phase 2 pipeline stages landed. will rebase against main and update the patterns to align with the new validate > normalise > scan > aggregate flow

some of the test fixtures will need tweaking too since the strict_mode switch probably changes how certain edge cases get handled

should have it updated in a day or two

VibhorGautam and others added 6 commits April 7, 2026 23:43
- bump jailbreak_patterns.json to v2.0.0 (24 patterns, 10 categories)
- each pattern now has id, category, severity, owasp_llm mapping
- update LoadPatterns to handle both structured and flat string formats
- add loader tests for both formats
- log warning on unparseable entries instead of silent skip
@VibhorGautam VibhorGautam force-pushed the test/adversarial-policy-fixtures branch from b1a3e8a to 476898d Compare April 8, 2026 06:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants