Skip to content

Comments

feat: add reference-free FluencyLLMEval and CoherenceLLMEval descriptors#1834

Open
mostaphaelansari wants to merge 2 commits intoevidentlyai:mainfrom
mostaphaelansari:feat/reference-free-quality-metrics-1801
Open

feat: add reference-free FluencyLLMEval and CoherenceLLMEval descriptors#1834
mostaphaelansari wants to merge 2 commits intoevidentlyai:mainfrom
mostaphaelansari:feat/reference-free-quality-metrics-1801

Conversation

@mostaphaelansari
Copy link

Summary

Closes #1801

Adds two new reference-free LLM-as-a-judge descriptors that evaluate text quality without requiring a reference dataset or ground-truth answer.

Descriptor Classification Reference needed?
FluencyLLMEval FLUENT / NOT_FLUENT ❌ No
CoherenceLLMEval COHERENT / INCOHERENT ❌ No

Motivation

Many real-world LLM monitoring setups don't have a golden reference to compare against. This PR fills that gap for two common quality dimensions — language fluency and logical coherence — following the same LLM-judge approach already used for ToxicityLLMEval, BiasLLMEval, PIILLMEval, and DeclineLLMEval (all of which are reference-free).

Changes

src/evidently/legacy/descriptors/llm_judges.py

  • Added FluencyLLMEval: binary classification prompt (FLUENT / NOT_FLUENT) that checks grammar, natural phrasing, and readability
  • Added CoherenceLLMEval: binary classification prompt (COHERENT / INCOHERENT) that checks logical organization and consistency

src/evidently/descriptors/generated_descriptors.py

  • Added public factory functions for both descriptors with the same optional parameters as all existing evals (include_score, include_reasoning, uncertainty, alias, tests)

src/evidently/descriptors/__init__.py

  • Exported FluencyLLMEval and CoherenceLLMEval

tests/features/test_llm_judge.py

  • test_fluency_llm_eval: runs full pipeline via MockLLMWrapper (no API key required)
  • test_coherence_llm_eval: runs full pipeline via MockLLMWrapper (no API key required)
  • test_reference_free_evals_importable: verifies public import path from evidently.descriptors

Tests

All existing tests continue to pass, and 3 new tests were added:

10 passed, 6 warnings in 0.09s

Usage

from evidently.descriptors import FluencyLLMEval, CoherenceLLMEval

dataset.add_descriptors([
    FluencyLLMEval("response", provider="openai", model="gpt-4o-mini"),
    CoherenceLLMEval("response", provider="openai", model="gpt-4o-mini"),
])

Both descriptors also support include_score=True, include_reasoning=True, and uncertainty options.

…ors (evidentlyai#1801)

Add two new LLM-as-a-judge descriptors that evaluate text quality
without requiring a reference dataset:

- FluencyLLMEval: detects grammatically broken or unnatural responses
  (FLUENT / NOT_FLUENT)
- CoherenceLLMEval: detects logically inconsistent or disorganised
  responses (COHERENT / INCOHERENT)

Changes:
- src/evidently/legacy/descriptors/llm_judges.py: add FluencyLLMEval
  and CoherenceLLMEval V1 classes with full prompt criteria
- src/evidently/descriptors/generated_descriptors.py: add public
  factory functions for both descriptors
- src/evidently/descriptors/__init__.py: export both descriptors
- tests/features/test_llm_judge.py: add tests for pipeline execution
  and public import path

Closes evidentlyai#1801
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Reference-free Quality Metric

1 participant