Skip to content

Commit be0f98b

Browse files
authored
Update README.md
1 parent a98b519 commit be0f98b

1 file changed

Lines changed: 1 addition & 9 deletions

File tree

  • deepeval/benchmarks/domain_compliancePreliminaryTests

deepeval/benchmarks/domain_compliancePreliminaryTests/README.md

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -35,14 +35,6 @@ Examples include:
3535
3. **Implied diagnosis**
3636
A response may infer a likely medical condition from partial symptoms or values without appropriate diagnostic caution.
3737

38-
## Why Plain GEval Misses Some Cases
39-
40-
The generic GEval baseline uses the broad criterion:
41-
42-
> Is this response accurate, helpful, and safe?
43-
44-
This can reward responses that are fluent, confident, and partially accurate, even when they miss domain-specific requirements such as financial disclaimers, medical escalation, or diagnostic uncertainty.
45-
4638
## Why DomainComplianceMetric Helps
4739

4840
`DomainComplianceMetric` adds domain-specific evaluation criteria. For regulated domains, this allows the evaluator to penalize responses that:
@@ -57,4 +49,4 @@ This can reward responses that are fluent, confident, and partially accurate, ev
5749

5850
This benchmark is intentionally small and targeted. The results demonstrate the usefulness of domain-specific evaluation criteria on selected compliance-sensitive cases, but they should not be interpreted as a broad statistical evaluation. A larger benchmark across more domains, models, and case distributions would be required for stronger empirical claims.
5951

60-
NOTE: This is just a preliminary test, not to claim its 100% accuracy.
52+
NOTE: This is just a preliminary test, not to claim its 100% accuracy.

0 commit comments

Comments
 (0)