Skip to content

feat(metrics): add DomainComplianceMetric for regulated industry LLM evaluation#2638

Open
sanianayab wants to merge 2 commits intoconfident-ai:mainfrom
sanianayab:feature/domain-compliance-metric
Open

feat(metrics): add DomainComplianceMetric for regulated industry LLM evaluation#2638
sanianayab wants to merge 2 commits intoconfident-ai:mainfrom
sanianayab:feature/domain-compliance-metric

Conversation

@sanianayab
Copy link
Copy Markdown

@sanianayab sanianayab commented Apr 29, 2026

Overview

This PR introduces DomainComplianceMetric, a new custom metric for evaluating LLM outputs in regulated industry domains: banking, healthcare, telco, and manufacturing.

Motivation

DeepEval's existing metrics (faithfulness, answer relevancy, hallucination) are domain-agnostic. This works well for general LLM apps, but regulated industry deployments face specific failure modes that generic metrics miss:

  • A banking LLM that confidently states a wrong interest rate
  • A healthcare LLM that prescribes a dosage not in the retrieved context
  • A telco LLM that guarantees 99.99% uptime based on nothing

Standard faithfulness checks may not catch these because the output sounds plausible. Domain-specific evaluation criteria , compliance hedging, no absolute guarantees, regulatory alignment are needed.

The evaluation steps enforce constraint-based binary judgments per compliance dimension, reducing LLM-as-judge stochasticity.

Changes

deepeval/metrics/domain_compliance/
    __init__.py                  # exports DomainComplianceMetric
    domain_compliance.py         # metric implementation

tests/
    test_domain_compliance.py    # pytest unit tests (banking + healthcare)

examples/
    domain_compliance_example.py # runnable usage example

How It Works

DomainComplianceMetric inherits from BaseMetric and wraps a domain-specific GEval instance with:

  • Per-domain evaluation criteria (regulatory accuracy, hedging, no guarantees)
  • Per-domain evaluation steps (constraint-based, grounded in compliance requirements)
  • Mandatory context enforcement (raises ValueError if context is missing, domain evaluation without context is meaningless)

Usage

from deepeval.metrics.domain_compliance import DomainComplianceMetric
from deepeval.test_case import LLMTestCase

metric = DomainComplianceMetric(domain="banking", threshold=0.7)
test_case = LLMTestCase(
    input="What is the early repayment fee?",
    actual_output="There is a 2% fee. Consult your advisor for full details.",
    context=["Loan agreement: 2% early repayment charge applies."]
)
metric.measure(test_case)
print(metric.score, metric.reason)

Supported Domains

Domain Key checks
banking Hallucinated rates/fees, AML/PSD2 alignment, no return guarantees
healthcare Hallucinated dosages/diagnoses, HIPAA alignment, professional referral
telco Fabricated SLAs/uptime, net neutrality alignment
manufacturing Fabricated sensor readings, safety-critical flagging, ISO alignment

Testing

deepeval test run tests/test_domain_compliance.py

Covers: compliant outputs (pass), non-compliant outputs (fail), missing context error, invalid domain error, async execution.

Notes

  • Fully compatible with DeepEval's CI/CD integration and Confident AI logging
  • Provider-agnostic: works with any model supported by DeepEval
  • Designed to be extended: additional domains can be added by appending to DOMAIN_CRITERIA and DOMAIN_EVALUATION_STEPS

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 29, 2026

@sanianayab is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

@sanianayab sanianayab marked this pull request as ready for review April 29, 2026 22:13
@sanianayab sanianayab closed this Apr 29, 2026
@sanianayab sanianayab reopened this Apr 29, 2026
@sanianayab sanianayab marked this pull request as draft April 29, 2026 22:24
@sanianayab sanianayab marked this pull request as ready for review April 29, 2026 22:26
@sanianayab sanianayab marked this pull request as draft April 29, 2026 22:27
@sanianayab sanianayab marked this pull request as ready for review April 30, 2026 12:26
@sanianayab sanianayab closed this Apr 30, 2026
@sanianayab sanianayab reopened this Apr 30, 2026
sanianayab

This comment was marked as resolved.

@sanianayab sanianayab closed this Apr 30, 2026
@sanianayab sanianayab reopened this Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant