Skip to content

Commit 4f2b67d

Browse files
xr843claude
andcommitted
feat(metrics): add AdversarialRobustnessMetric
Add a black-box metric that measures how robust an LLM is to adversarial, meaning-preserving perturbations of its input, addressing #2150. Inspired by the RoMA framework (arXiv:2504.17723), for each test case the metric: 1. uses the evaluation model to generate meaning-preserving adversarial perturbations of the input (semantic synonym/rephrasing swaps and orthographic typo-style character noise); 2. probes the system under test via a `model_callback` on every perturbation; 3. uses the evaluation model to judge whether each perturbed response stays semantically consistent with the reference `actual_output`. The score is the fraction of perturbations the system stayed consistent on (1.0 = perfectly robust); higher is better, so a case passes when `score >= threshold`. Follows the existing BaseMetric conventions: sync/async `measure`/`a_measure`, schema-based generation, the compiled prompt-template bundle, strict/verbose modes, and cost/token accrual. Unlike the earlier draft (#2181) this pulls in no heavyweight runtime dependencies (no gensim/nltk/numpy and no large Word2Vec download) and judges robustness by LLM-graded semantic consistency rather than brittle exact string matching. Adds the metric to the public `deepeval.metrics` exports, the compiled metric-template bundles (Python + TypeScript), and a fully mocked test suite (no real API calls). Signed-off-by: xr843 <xianren843@protonmail.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 8ebfa33 commit 4f2b67d

13 files changed

Lines changed: 763 additions & 0 deletions

File tree

deepeval/metrics/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,9 @@
1414
from .non_advice.non_advice import NonAdviceMetric
1515
from .misuse.misuse import MisuseMetric
1616
from .role_violation.role_violation import RoleViolationMetric
17+
from .adversarial_robustness.adversarial_robustness import (
18+
AdversarialRobustnessMetric,
19+
)
1720
from .hallucination.hallucination import HallucinationMetric
1821
from .answer_relevancy.answer_relevancy import AnswerRelevancyMetric
1922
from .summarization.summarization import SummarizationMetric
@@ -102,6 +105,7 @@
102105
"MisuseMetric",
103106
"RoleViolationMetric",
104107
"RoleAdherenceMetric",
108+
"AdversarialRobustnessMetric",
105109
# Task-specific metrics
106110
"ToolCorrectnessMetric",
107111
"JsonCorrectnessMetric",
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from .adversarial_robustness import AdversarialRobustnessMetric
2+
3+
__all__ = ["AdversarialRobustnessMetric"]

0 commit comments

Comments
 (0)