test(metrics): add overlapping-chunk regression fixtures for ContextualRecallMetric (closes #2788) by Ruthwik-Data · Pull Request #2789 · confident-ai/deepeval

Ruthwik-Data · 2026-06-20T17:15:45Z

Summary

Adds tests/test_metrics/test_contextual_recall_overlapping_chunks.py — regression test fixtures for ContextualRecallMetric overlapping-chunk behaviour.

Closes #2788.

This is the symmetric complement to PR #2787 (which addresses the same issue for ContextualPrecisionMetric). The tests document the failure mode and will serve as regression targets once the source-grouping fix is applied to ContextualRecallMetric.

Motivation

PR #2743 added _group_retrieval_contexts() (source-grouping deduplication) to ContextualPrecisionMetric to fix issue #2594. The same fix was not applied symmetrically to ContextualRecallMetric, leaving an identical failure mode:

RAG pipelines with 10–20% sliding-window chunk overlap (standard for dense financial documents) produce chunks where the same information appears in adjacent chunks.
ContextualRecallMetric scores each chunk independently. If the LLM judge returns yes for the first chunk and no for the second (partial redundancy), the recall score is halved — even though the expected output is fully covered.
Redundancy ≠ Missing coverage.

What this PR adds

Fixtures

Fixture	Scenario
`overlapping_revenue_chunks`	Two same-source overlapping chunks from 10-K MD&A
`multi_statement_expected_output`	Three-statement expected output spanning two sources
`retrieval_context_multi_source`	Two overlapping 10-K chunks + one earnings-call chunk

Tests

Test	Guards against
`test_same_source_overlap_does_not_lower_recall`	Recall penalised for same-source chunk redundancy
`test_multi_source_recall_not_inflated_by_overlap`	Overlapping chunks distorting multi-statement recall
`test_increasing_overlap_does_not_decrease_recall`	Monotonicity: more context from same source → recall stays stable

Design notes

Tests use RetrievedContextData(content=..., source=...) — post-feat(contextual-precision): add RetrievedContextData source grouping and fix weighted precision score #2743 API.
threshold=0.0 in the monotonicity test isolates scoring logic from pass/fail cutoffs.
Tests are written as regression targets — they document the expected post-fix behaviour.
No new dependencies.

Type of change

Test (adding missing tests or correcting existing tests)

…alRecallMetric (issue confident-ai#2788) This test suite verifies the behavior of the ContextualRecallMetric in scenarios with overlapping chunks, ensuring that recall scores remain accurate and do not penalize redundancy. It includes tests for same-source overlaps, multi-source retrieval, and the impact of increasing overlap on recall.

vercel · 2026-06-20T17:15:48Z

@Ruthwik-Data is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

Ruthwik-Data added 8 commits June 15, 2026 12:22

style: fix prettier formatting in deepseek-model.ts

903fed0

style: fix prettier formatting in kimi-model.ts

af9425c

style: fix prettier formatting in openai-model.ts

f22ad6d

style: fix prettier formatting in openrouter-model.ts

02429a1

style: fix prettier formatting in portkey-model.ts

fdf3058

style: fix prettier line wrapping in kimi-model.ts

63ec06d

Update openai-model.ts

4aa1973

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(metrics): add overlapping-chunk regression fixtures for ContextualRecallMetric (closes #2788)#2789

test(metrics): add overlapping-chunk regression fixtures for ContextualRecallMetric (closes #2788)#2789
Ruthwik-Data wants to merge 8 commits into
confident-ai:mainfrom
Ruthwik-Data:test/contextual-recall-overlapping-chunks

Ruthwik-Data commented Jun 20, 2026

Uh oh!

vercel Bot commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Ruthwik-Data commented Jun 20, 2026

Summary

Motivation

What this PR adds

Fixtures

Tests

Design notes

Type of change

Uh oh!

vercel Bot commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant