feat: adversarial benchmark harness: 50 payloads, 27 tests, policy matrix coverage#12
Open
Pranjal0410 wants to merge 1 commit intoc2siorg:mainfrom
Open
feat: adversarial benchmark harness: 50 payloads, 27 tests, policy matrix coverage#12Pranjal0410 wants to merge 1 commit intoc2siorg:mainfrom
Pranjal0410 wants to merge 1 commit intoc2siorg:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR adds
A benchmark harness that evaluates scanner performance against curated adversarial and benign datasets, producing precision/recall/F1 metrics, per-category breakdowns, and latency statistics.
This gives the team a regression safety net — any scanner or policy change can be validated against the benchmark before merging.
Why this matters
Right now we have scanners being built (#10, #5) but no way to measure how well they actually perform. This harness answers: "What's the detection rate? What's the false positive rate? Which attack categories are we missing?"
Policy matrix alignment
The dataset is mapped to the v1 detection policies from the taxonomy matrix:
instruction_override(6 payloads)role_hijack(4 payloads)data_exfiltration(5 payloads)context_manipulation(3 payloads)tool_abuse(3 payloads)memory_poisoning(2 payloads)Tests explicitly validate this coverage (
TestPolicyMatrixCoverage).Components
Benchmark Runner (
acf_sdk/benchmarks/harness.py)ScanInputand returns an actionCurated Dataset (
acf_sdk/benchmarks/dataset.py)CLI Runner (
acf_sdk/benchmarks/run_benchmark.py)Sample Output (TF-IDF backend)
============================================================
ACF-SDK BENCHMARK REPORT
Payloads: 50 (25 malicious, 25 benign)
TP: 12 FP: 8 TN: 17 FN: 13
Precision: 0.6000
Recall: 0.4800
F1 Score: 0.5333
Latency — mean: 0.53ms p50: 0.56ms p95: 0.62ms p99: 0.78ms
Category Total TP FN Recall Avg Score
instruction_override 6 2 4 0.3333 0.6537
data_exfiltration 5 5 0 1.0000 0.9374
role_hijack 4 4 0 1.0000 0.9075
context_manipulation 3 0 3 0.0000 0.6441
tool_abuse 3 1 2 0.3333 0.6815
memory_poisoning 2 0 2 0.0000 0.5344
encoding_evasion 2 0 2 0.0000 0.5756
The TF-IDF backend catches data exfiltration and role hijack well (100% recall) but misses context manipulation and encoding evasion — expected, since TF-IDF relies on keyword overlap. The
sentence-transformerbackend will improve these numbers. The harness makes this gap visible and measurable.Tests — 27 passing
pytest tests/test_benchmark.py -v # 27 passedFiles
acf_sdk/benchmarks/
├── init.py
├── harness.py # BenchmarkRunner + BenchmarkReport
├── dataset.py # 50 curated payloads
└── run_benchmark.py # CLI entry point
tests/
└── test_benchmark.py # 27 tests
Next steps
sentence-transformerbackend and compare metrics