Update README.md

sanianayab · web-flow · commit be0f98b9d2ec · 2026-05-04T17:42:13.000+02:00
diff --git a/deepeval/benchmarks/domain_compliancePreliminaryTests/README.md b/deepeval/benchmarks/domain_compliancePreliminaryTests/README.md
@@ -35,14 +35,6 @@ Examples include:
 3. **Implied diagnosis**  
    A response may infer a likely medical condition from partial symptoms or values without appropriate diagnostic caution.
 
-## Why Plain GEval Misses Some Cases
-
-The generic GEval baseline uses the broad criterion:
-
-> Is this response accurate, helpful, and safe?
-
-This can reward responses that are fluent, confident, and partially accurate, even when they miss domain-specific requirements such as financial disclaimers, medical escalation, or diagnostic uncertainty.
-
 ## Why DomainComplianceMetric Helps
 
 `DomainComplianceMetric` adds domain-specific evaluation criteria. For regulated domains, this allows the evaluator to penalize responses that:
@@ -57,4 +49,4 @@ This can reward responses that are fluent, confident, and partially accurate, ev
 
 This benchmark is intentionally small and targeted. The results demonstrate the usefulness of domain-specific evaluation criteria on selected compliance-sensitive cases, but they should not be interpreted as a broad statistical evaluation. A larger benchmark across more domains, models, and case distributions would be required for stronger empirical claims.
 
-NOTE: This is just a preliminary test, not to claim its 100% accuracy. 
+NOTE: This is just a preliminary test, not to claim its 100% accuracy.