Commit 4f2b67d
feat(metrics): add AdversarialRobustnessMetric
Add a black-box metric that measures how robust an LLM is to adversarial,
meaning-preserving perturbations of its input, addressing #2150.
Inspired by the RoMA framework (arXiv:2504.17723), for each test case the
metric:
1. uses the evaluation model to generate meaning-preserving adversarial
perturbations of the input (semantic synonym/rephrasing swaps and
orthographic typo-style character noise);
2. probes the system under test via a `model_callback` on every
perturbation;
3. uses the evaluation model to judge whether each perturbed response stays
semantically consistent with the reference `actual_output`.
The score is the fraction of perturbations the system stayed consistent on
(1.0 = perfectly robust); higher is better, so a case passes when
`score >= threshold`. Follows the existing BaseMetric conventions: sync/async
`measure`/`a_measure`, schema-based generation, the compiled prompt-template
bundle, strict/verbose modes, and cost/token accrual.
Unlike the earlier draft (#2181) this pulls in no heavyweight runtime
dependencies (no gensim/nltk/numpy and no large Word2Vec download) and judges
robustness by LLM-graded semantic consistency rather than brittle exact string
matching.
Adds the metric to the public `deepeval.metrics` exports, the compiled
metric-template bundles (Python + TypeScript), and a fully mocked test suite
(no real API calls).
Signed-off-by: xr843 <xianren843@protonmail.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>1 parent 8ebfa33 commit 4f2b67d
13 files changed
Lines changed: 763 additions & 0 deletions
File tree
- deepeval
- metrics
- adversarial_robustness
- templates
- templates
- metrics
- tests
- test_core
- test_metrics
- typescript/src/templates/metrics
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
17 | 20 | | |
18 | 21 | | |
19 | 22 | | |
| |||
102 | 105 | | |
103 | 106 | | |
104 | 107 | | |
| 108 | + | |
105 | 109 | | |
106 | 110 | | |
107 | 111 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
0 commit comments