test(policies): adversarial test fixtures for v1 hook policies by VibhorGautam · Pull Request #26 · c2siorg/acf-sdk

VibhorGautam · 2026-03-28T18:16:10Z

fills in the test stubs from #19 with actual adversarial test cases mapped to attack patterns from issue #2, updated for phase 2 pipeline

whats included

policies/tests/ (4 files, 30 test cases total):

prompt_test.rego - 11 tests: instruction override, jailbreak, role escalation, unicode obfuscation, split injection, false positive check, v2 state escalation
tool_test.rego - 6 tests: unlisted tool block, param injection, shell exec, nested json override, fp check for security discussion
context_test.rego - 6 tests: indirect injection in retrieved content, exfil attempt, cross-chunk injection, security docs fp
memory_test.rego - 5 tests: memory poisoning, cross-session access, trust escalation via accumulated writes

policies/v1/data/jailbreak_patterns.json (v2.0.0):

24 structured patterns across 10 categories: instruction_override, role_escalation, system_prompt_extraction, delimiter_escape, context_manipulation, encoding_bypass, tool_boundary_violation, unicode_obfuscation, indirect_injection, multi_turn_manipulation, exfiltration
each pattern has id, category, severity, owasp llm risk mapping
replaces flat string array with structured entries for better test coverage and policy debugging

sidecar/internal/config/loader.go:

updated LoadPatterns to handle both structured entries (objects w pattern field) and flat strings for backward compat
logs warnings on unparseable entries instead of silently skipping

sidecar/internal/config/loader_test.go (new):

3 tests: structured format, flat string format, missing file error

why this matters

pranjals #19 added the policy templates and test file headers but the actual test implementations were empty stubs. this fills those in w adversarial payloads so opa test catches regressions when policies change

rebased against phase 2 main (validate > normalise > scan > aggregate pipeline). the loader update ensures the structured pattern format works with the aho-corasick scan stage

false positive tests are important too - a policy that blocks "how do i prevent sql injection" just bc it contains "sql injection" is worse than no policy at all imo

how to test

# opa tests
cd policies
opa test v1/ tests/ -v

# go tests (loader + pipeline)
cd sidecar
go test ./...

maps to the adversarial validation work from issue #2 and the payload library in PR #5

tharindupr · 2026-04-06T18:47:27Z

I like the way you have defined the Jailbreak Patterns. Can you consider doing this for the latest version?

VibhorGautam · 2026-04-07T18:12:43Z

yea for sure, saw the phase 2 pipeline stages landed. will rebase against main and update the patterns to align with the new validate > normalise > scan > aggregate flow

some of the test fixtures will need tweaking too since the strict_mode switch probably changes how certain edge cases get handled

should have it updated in a day or two

- bump jailbreak_patterns.json to v2.0.0 (24 patterns, 10 categories) - each pattern now has id, category, severity, owasp_llm mapping - update LoadPatterns to handle both structured and flat string formats - add loader tests for both formats - log warning on unparseable entries instead of silent skip

VibhorGautam and others added 6 commits April 7, 2026 23:43

test(prompt): adversarial test fixtures for on_prompt hook policy

77230c7

test(tool): adversarial test fixtures for on_tool_call hook policy

bbd83f0

test(context): adversarial test fixtures for on_context hook policy

f9a090a

test(memory): adversarial test fixtures for on_memory hook policy

89fb16a

data: populate jailbreak patterns library (14 patterns, 7 categories)

ae294a0

VibhorGautam force-pushed the test/adversarial-policy-fixtures branch from b1a3e8a to 476898d Compare April 8, 2026 06:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(policies): adversarial test fixtures for v1 hook policies#26

test(policies): adversarial test fixtures for v1 hook policies#26
VibhorGautam wants to merge 6 commits intoc2siorg:mainfrom
VibhorGautam:test/adversarial-policy-fixtures

VibhorGautam commented Mar 28, 2026 •

edited

Loading

Uh oh!

tharindupr commented Apr 6, 2026

Uh oh!

VibhorGautam commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

VibhorGautam commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

whats included

why this matters

how to test

Uh oh!

tharindupr commented Apr 6, 2026

Uh oh!

VibhorGautam commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

VibhorGautam commented Mar 28, 2026 •

edited

Loading