feat: adversarial benchmark harness: 50 payloads, 27 tests, policy matrix coverage by Pranjal0410 · Pull Request #12 · c2siorg/acf-sdk

Pranjal0410 · 2026-03-20T12:01:12Z

What this PR adds

A benchmark harness that evaluates scanner performance against curated adversarial and benign datasets, producing precision/recall/F1 metrics, per-category breakdowns, and latency statistics.

This gives the team a regression safety net — any scanner or policy change can be validated against the benchmark before merging.

Why this matters

Right now we have scanners being built (#10, #5) but no way to measure how well they actually perform. This harness answers: "What's the detection rate? What's the false positive rate? Which attack categories are we missing?"

Policy matrix alignment

The dataset is mapped to the v1 detection policies from the taxonomy matrix:

Policy (taxonomy matrix)	Scope	Dataset category
Instruction override detection	on_prompt	`instruction_override` (6 payloads)
Role escalation detection	on_prompt	`role_hijack` (4 payloads)
Jailbreak pattern library	on_prompt	`data_exfiltration` (5 payloads)
Embedded instruction detection	on_context	`context_manipulation` (3 payloads)
Parameter injection scan	on_tool_call	`tool_abuse` (3 payloads)
Write-time content scan	on_memory	`memory_poisoning` (2 payloads)

Tests explicitly validate this coverage (TestPolicyMatrixCoverage).

Components

Benchmark Runner (acf_sdk/benchmarks/harness.py)

Scanner-agnostic — accepts any callable that takes ScanInput and returns an action
Confusion matrix, precision, recall, F1, accuracy
Per-category breakdown with recall and average risk scores
Latency percentiles (p50/p95/p99)
JSON export for CI integration
Quality gate — exits non-zero if F1 drops below threshold

Curated Dataset (acf_sdk/benchmarks/dataset.py)

25 malicious payloads across 7 categories
25 benign payloads including 10 hard negatives ("ignore previous commits", "system prompt debugging", "override a method in Python")
Multiple input types: prompts, RAG documents, tool outputs, memory writes

CLI Runner (acf_sdk/benchmarks/run_benchmark.py)

python -m acf_sdk.benchmarks.run_benchmark
python -m acf_sdk.benchmarks.run_benchmark --backend sentence-transformer
python -m acf_sdk.benchmarks.run_benchmark --output results.json

Sample Output (TF-IDF backend)

============================================================
ACF-SDK BENCHMARK REPORT
Payloads: 50 (25 malicious, 25 benign)
TP: 12 FP: 8 TN: 17 FN: 13
Precision: 0.6000
Recall: 0.4800
F1 Score: 0.5333
Latency — mean: 0.53ms p50: 0.56ms p95: 0.62ms p99: 0.78ms
Category Total TP FN Recall Avg Score
instruction_override 6 2 4 0.3333 0.6537
data_exfiltration 5 5 0 1.0000 0.9374
role_hijack 4 4 0 1.0000 0.9075
context_manipulation 3 0 3 0.0000 0.6441
tool_abuse 3 1 2 0.3333 0.6815
memory_poisoning 2 0 2 0.0000 0.5344
encoding_evasion 2 0 2 0.0000 0.5756

The TF-IDF backend catches data exfiltration and role hijack well (100% recall) but misses context manipulation and encoding evasion — expected, since TF-IDF relies on keyword overlap. The sentence-transformer backend will improve these numbers. The harness makes this gap visible and measurable.

Tests — 27 passing

pytest tests/test_benchmark.py -v   # 27 passed

Dataset integrity (unique IDs, required fields, balance)
Report correctness (confusion matrix sums, metric bounds)
Per-category breakdown
Latency stats
JSON export validity
F1 quality gate
Policy matrix coverage (validates dataset covers v1 detection policies)
Dataset balance + hard negative coverage

Files

acf_sdk/benchmarks/
├── init.py
├── harness.py # BenchmarkRunner + BenchmarkReport
├── dataset.py # 50 curated payloads
└── run_benchmark.py # CLI entry point
tests/
└── test_benchmark.py # 27 tests

Next steps

Run with sentence-transformer backend and compare metrics
Integrate payloads from test: adversarial payload library for pipeline validation #5 into the dataset
Add CI job that fails if F1 drops below threshold
Expand with multi-language and multi-turn payloads

feat: adversarial benchmark harness with curated dataset

f6aa359

Pranjal0410 mentioned this pull request Mar 21, 2026

feat: semantic scanner + benchmark harness + LangGraph adapter (54 tests) #15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adversarial benchmark harness: 50 payloads, 27 tests, policy matrix coverage#12

feat: adversarial benchmark harness: 50 payloads, 27 tests, policy matrix coverage#12
Pranjal0410 wants to merge 1 commit intoc2siorg:mainfrom
Pranjal0410:feat/benchmark-harness

Pranjal0410 commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Pranjal0410 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR adds

Why this matters

Policy matrix alignment

Components

Sample Output (TF-IDF backend)

Tests — 27 passing

Files

Next steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Pranjal0410 commented Mar 20, 2026 •

edited

Loading