feat: add reference-free FluencyLLMEval and CoherenceLLMEval descriptors by mostaphaelansari · Pull Request #1834 · evidentlyai/evidently

mostaphaelansari · 2026-02-22T22:34:22Z

Summary

Adds two new reference-free LLM-as-a-judge descriptors that evaluate text quality without requiring a reference dataset or ground-truth answer.

Descriptor	Classification	Reference needed?
`FluencyLLMEval`	`FLUENT` / `NOT_FLUENT`	❌ No
`CoherenceLLMEval`	`COHERENT` / `INCOHERENT`	❌ No

Motivation

Many real-world LLM monitoring setups don't have a golden reference to compare against. This PR fills that gap for two common quality dimensions — language fluency and logical coherence — following the same LLM-judge approach already used for ToxicityLLMEval, BiasLLMEval, PIILLMEval, and DeclineLLMEval (all of which are reference-free).

Changes

`src/evidently/legacy/descriptors/llm_judges.py`

Added FluencyLLMEval: binary classification prompt (FLUENT / NOT_FLUENT) that checks grammar, natural phrasing, and readability
Added CoherenceLLMEval: binary classification prompt (COHERENT / INCOHERENT) that checks logical organization and consistency

`src/evidently/descriptors/generated_descriptors.py`

Added public factory functions for both descriptors with the same optional parameters as all existing evals (include_score, include_reasoning, uncertainty, alias, tests)

`src/evidently/descriptors/init.py`

Exported FluencyLLMEval and CoherenceLLMEval

`tests/features/test_llm_judge.py`

test_fluency_llm_eval: runs full pipeline via MockLLMWrapper (no API key required)
test_coherence_llm_eval: runs full pipeline via MockLLMWrapper (no API key required)
test_reference_free_evals_importable: verifies public import path from evidently.descriptors

Tests

All existing tests continue to pass, and 3 new tests were added:

10 passed, 6 warnings in 0.09s

Usage

from evidently.descriptors import FluencyLLMEval, CoherenceLLMEval

dataset.add_descriptors([
    FluencyLLMEval("response", provider="openai", model="gpt-4o-mini"),
    CoherenceLLMEval("response", provider="openai", model="gpt-4o-mini"),
])

Both descriptors also support include_score=True, include_reasoning=True, and uncertainty options.

…ors (evidentlyai#1801) Add two new LLM-as-a-judge descriptors that evaluate text quality without requiring a reference dataset: - FluencyLLMEval: detects grammatically broken or unnatural responses (FLUENT / NOT_FLUENT) - CoherenceLLMEval: detects logically inconsistent or disorganised responses (COHERENT / INCOHERENT) Changes: - src/evidently/legacy/descriptors/llm_judges.py: add FluencyLLMEval and CoherenceLLMEval V1 classes with full prompt criteria - src/evidently/descriptors/generated_descriptors.py: add public factory functions for both descriptors - src/evidently/descriptors/__init__.py: export both descriptors - tests/features/test_llm_judge.py: add tests for pipeline execution and public import path Closes evidentlyai#1801

…dings, and FAISS retrieval.

Liraim

Thank you for your contribution and interest in the project!

There is a small change I would like you to make before we merge this PR.

Liraim · 2026-05-02T13:08:25Z

Please remove this file from PR, as it is not needed in the repository.
We will see successful tests after workflow launch.

Thanks for the feedback! I’ve removed the file from the PR as requested. Let me know if anything else needs adjustment.

mostaphaelansari added 2 commits February 22, 2026 23:23

feat: Implement RAG indexing and data collection with chunking, embed…

f046d1a

…dings, and FAISS retrieval.

emeli-dral requested a review from Liraim February 25, 2026 10:40

Liraim requested changes May 2, 2026

View reviewed changes

Delete test_output.txt

94782fc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add reference-free FluencyLLMEval and CoherenceLLMEval descriptors#1834

feat: add reference-free FluencyLLMEval and CoherenceLLMEval descriptors#1834
mostaphaelansari wants to merge 3 commits into
evidentlyai:mainfrom
mostaphaelansari:feat/reference-free-quality-metrics-1801

mostaphaelansari commented Feb 22, 2026

Uh oh!

Liraim left a comment

Uh oh!

Liraim May 2, 2026

Uh oh!

mostaphaelansari May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mostaphaelansari commented Feb 22, 2026

Summary

Motivation

Changes

src/evidently/legacy/descriptors/llm_judges.py

src/evidently/descriptors/generated_descriptors.py

src/evidently/descriptors/__init__.py

tests/features/test_llm_judge.py

Tests

Usage

Uh oh!

Liraim left a comment

Choose a reason for hiding this comment

Uh oh!

Liraim May 2, 2026

Choose a reason for hiding this comment

Uh oh!

mostaphaelansari May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`src/evidently/legacy/descriptors/llm_judges.py`

`src/evidently/descriptors/generated_descriptors.py`

`src/evidently/descriptors/init.py`

`tests/features/test_llm_judge.py`