feat(metrics): add deterministic ToolPermissionMetric by gh-raju · Pull Request #2826 · confident-ai/deepeval

gh-raju · 2026-07-01T11:31:38Z

What

Adds ToolPermissionMetric, a deterministic metric that checks whether an agent only called tools it was authorized to, given a permission policy:

allowed_tools — an allowlist (least privilege); a called tool not in the list is unauthorized.
denied_tools — an explicit denylist; a called tool in the list is unauthorized (a denial always wins over an allow).

Score = fraction of authorized tool calls (1.0 when no tools were called). Requires no LLM / API key.

Why

ToolCorrectnessMetric compares called vs expected tools and explicitly does not check authorization. Least-privilege / tool-permission enforcement is a growing production requirement for agents — an out-of-scope tool call (delete_account, wire_transfer, …) is a safety failure regardless of task success. This makes that a deterministic, CI-gateable check.

Relates to #2825.

Changes

deepeval/metrics/tool_permission/{__init__.py,tool_permission.py} — the metric (mirrors ToolCorrectnessMetric structure and the BaseMetric contract).
deepeval/metrics/__init__.py — export ToolPermissionMetric.
tests/test_metrics/test_tool_permission_metric.py — 9 tests: allowlist, denylist, denial-wins, partial credit + threshold, strict mode, no-tools, policy-required, and sync/async parity. Deterministic, so they run without any API key.

Verification

pytest tests/test_metrics/test_tool_permission_metric.py → 9 passed.
black --line-length 80 and ruff check → clean.

Notes

No new dependencies. No breaking changes.
Deterministic (no model), so it is safe to run on every PR in CI.

Checks that an agent only called tools it was authorized to, against an allowed_tools allowlist and/or denied_tools denylist (a denial always wins). Score is the fraction of authorized tool calls (1.0 when no tools were called). Unlike ToolCorrectnessMetric it evaluates authorization rather than task-correctness, and is fully deterministic (no LLM / API key), so it works as a CI gate. Adds deepeval/metrics/tool_permission/, exports ToolPermissionMetric from deepeval.metrics, and tests/test_metrics/test_tool_permission_metric.py (9 tests, no API key). black + ruff clean.

vercel · 2026-07-01T11:31:43Z

@gh-raju is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

docs: add ToolPermissionMetric page and register it in the Non-LLM nav

efa1d31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(metrics): add deterministic ToolPermissionMetric#2826

feat(metrics): add deterministic ToolPermissionMetric#2826
gh-raju wants to merge 2 commits into
confident-ai:mainfrom
gh-raju:feat/tool-permission-metric

gh-raju commented Jul 1, 2026

Uh oh!

vercel Bot commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gh-raju commented Jul 1, 2026

What

Why

Changes

Verification

Notes

Uh oh!

vercel Bot commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant