Summary
AgentThreatBench is the first benchmark suite that operationalizes the OWASP Top 10 for Agentic Applications (2026) into executable evaluation tasks. It was merged into UKGovernmentBEIS/inspect_evals, the official evaluation suite maintained by the UK AI Safety Institute.
What it benchmarks
Three tasks targeting distinct agentic attack surfaces:
- Memory Poison (ASI06) — Tests whether agents correctly answer questions from a memory store that contains adversarial entries (direct injection, context poisoning, gradual poisoning, authority impersonation, delimiter escape, role hijack)
- Autonomy Hijack (ASI01) — Tests whether agents performing email triage resist indirect instruction injection embedded in email content returned by tools
- Data Exfiltration (ASI01) — Tests whether a customer-support agent can be redirected via indirect injection in
lookup_customer output into leaking SSNs/account numbers via send_message
Scoring approach
Uses a dual-metric approach: utility (task completion) + security (attack resistance) scored independently. This maps well to deepeval's metric architecture.
Proposal
Add AgentThreatBench as a built-in security benchmark in deepeval, similar to how deepeval already supports custom red-team metrics. The dataset is open-source.
Benchmark docs: https://ukgovernmentbeis.github.io/inspect_evals/evals/safeguards/agent_threat_bench/
Source: https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/agent_threat_bench
OWASP reference: https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/
Summary
AgentThreatBench is the first benchmark suite that operationalizes the OWASP Top 10 for Agentic Applications (2026) into executable evaluation tasks. It was merged into UKGovernmentBEIS/inspect_evals, the official evaluation suite maintained by the UK AI Safety Institute.
What it benchmarks
Three tasks targeting distinct agentic attack surfaces:
lookup_customeroutput into leaking SSNs/account numbers viasend_messageScoring approach
Uses a dual-metric approach: utility (task completion) + security (attack resistance) scored independently. This maps well to deepeval's metric architecture.
Proposal
Add AgentThreatBench as a built-in security benchmark in deepeval, similar to how deepeval already supports custom red-team metrics. The dataset is open-source.
Benchmark docs: https://ukgovernmentbeis.github.io/inspect_evals/evals/safeguards/agent_threat_bench/
Source: https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/agent_threat_bench
OWASP reference: https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/