feat: inspect and bound GEval retrieval context prompts by RitwijParmar · Pull Request #2742 · confident-ai/deepeval

RitwijParmar · 2026-06-10T14:08:21Z

This adds an opt-in retrieval context budget for GEval, a no-LLM inspection path for large RAG judge prompts, and an evidence coverage report so users can see whether prompt compaction removed terms the judge still needs.

What changed

Added max_retrieval_context_tokens to GEval. Default behavior is unchanged unless this option is set.
Large retrieval_context values are compacted before they enter the judge prompt.
The compactor preserves source labels for RetrievedContextData, keeps head and tail evidence from visible chunks, and inserts explicit omission markers.
Chunks are ranked by lexical overlap against selected input, actual output, and expected output fields.
Added structured budget reports via get_retrieval_context_budget_report(test_case).
Budget reports include original token estimate, rendered token estimate, compression ratio, visible chunks, omitted chunks, per-chunk source metadata, relevance scores, and evidence coverage.
Added get_retrieval_context_evidence_coverage(test_case) for the coverage ratio, covered terms, missing terms, and warning message.
Added preview_evaluation_prompt(test_case) so users can inspect the exact bounded judge prompt in CI or while tuning RAG metrics without calling the evaluation model.
Refactored GEval prompt construction so sync, async, and preview paths share the same prompt builder.
Documented the debugging workflow in the G-Eval docs.
Added synthetic large-RAG prompt-budget regression tests for compression, source preservation, relevance ranking, missing evidence detection, and prompt preview.

Why this matters
Large RAG contexts can make custom GEval prompts expensive and hard to trust. Clipping blindly is risky because it can hide the evidence that should support or refute the answer. This PR gives users a bounded prompt and a diagnostic report that says what survived the budget and what evidence terms disappeared.

Validation

uv run --isolated --python 3.11 --with-editable . python -m pytest tests/test_metrics/test_g_eval_utils.py tests/test_metrics/test_g_eval_prompt_budget.py -q
- 14 passed
python3 -m black --check deepeval/metrics/g_eval/g_eval.py deepeval/metrics/g_eval/utils.py deepeval/metrics/g_eval/__init__.py tests/test_metrics/test_g_eval_utils.py tests/test_metrics/test_g_eval_prompt_budget.py
uvx ruff check deepeval/metrics/g_eval/g_eval.py deepeval/metrics/g_eval/utils.py deepeval/metrics/g_eval/__init__.py tests/test_metrics/test_g_eval_utils.py tests/test_metrics/test_g_eval_prompt_budget.py
python3 -m compileall -q deepeval/metrics/g_eval tests/test_metrics/test_g_eval_prompt_budget.py tests/test_metrics/test_g_eval_utils.py
git diff --check

Note on current CI

Vercel requires maintainer authorization for fork deployment.

vercel · 2026-06-10T14:08:27Z

@RitwijParmar is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

Signed-off-by: Ritwij Aryan Parmar <ritwij.aryan.parmar@gmail.com>

feat: bound GEval retrieval context prompts

b04eefa

feat: add GEval retrieval budget inspection

aeb6078

RitwijParmar changed the title ~~feat: bound GEval retrieval context prompts~~ feat: inspect and bound GEval retrieval context prompts Jun 10, 2026

RitwijParmar added 4 commits June 10, 2026 13:11

feat: rank GEval retrieval context by relevance

3d54502

Signed-off-by: Ritwij Aryan Parmar <ritwij.aryan.parmar@gmail.com>

fix: correct CLI settings import

e6d2d91

style: apply current black formatting

c14f430

feat: report GEval retrieval evidence coverage

13d63af

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: inspect and bound GEval retrieval context prompts#2742

feat: inspect and bound GEval retrieval context prompts#2742
RitwijParmar wants to merge 6 commits into
confident-ai:mainfrom
RitwijParmar:codex/deepeval-geval-retrieval-budget

RitwijParmar commented Jun 10, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RitwijParmar commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RitwijParmar commented Jun 10, 2026 •

edited

Loading