test(metrics): add overlapping-chunk regression fixtures for ContextualPrecisionMetric (rebased on #2743) by Ruthwik-Data · Pull Request #2787 · confident-ai/deepeval

Ruthwik-Data · 2026-06-20T17:12:24Z

Summary

Adds tests/test_metrics/test_contextual_precision_overlapping_chunks.py — a focused regression test suite for ContextualPrecisionMetric behaviour under overlapping-chunk retrieval scenarios.

This supersedes the test file from PR #2692 and is rebased on the merged changes from PR #2743 (RetrievedContextData._group_retrieval_contexts + WCP formula fix). The fixtures now use the new RetrievedContextData with a source field, which is the correct API post-merge.

Closes #2594 (test coverage aspect).

Motivation

PR #2743 fixed two root causes:

Source grouping — _group_retrieval_contexts() merges multiple RetrievedContextData chunks sharing the same source string before LLM scoring, preventing redundant-chunk penalisation.
WCP formula — _calculate_score() now correctly implements weighted cumulative precision (binary verdicts, rank-discounted).

Without regression tests, these fixes can silently regress. This PR adds four targeted tests that each guard one failure mode from issue #2594.

What this PR adds

Test fixtures

Fixture	Scenario
`overlapping_narrative_chunks`	Two same-source 20%-overlap 10-K narrative chunks
`non_overlapping_mixed_chunks`	Three chunks from distinct sources (one relevant)
`table_cell_overlap_chunks`	Adjacent table-row chunks with shared header (same source/page)

Test cases

Test	Guards against
`test_same_source_overlap_does_not_penalise`	Redundancy treated as irrelevance (regression for #2594)
`test_cross_source_chunks_scored_independently`	Source grouping incorrectly merging different documents
`test_table_boundary_overlap_not_penalised`	Table-header overlap lowering score for financial doc RAG
`test_wcp_formula_monotone_with_relevant_rank`	WCP score not decreasing as relevant chunk rank drops

Design notes

Tests use RetrievedContextData(content=..., source=...) — the post-feat(contextual-precision): add RetrievedContextData source grouping and fix weighted precision score #2743 API.
threshold=0.0 in the WCP monotonicity test isolates formula logic from pass/fail cutoffs.
No new dependencies; runs with any LLM key configured in conftest.py.
Supersedes PR test(metrics): add overlapping-chunk test fixtures for ContextualPrecisionMetric #2692 (same scope, updated for merged API).

Type of change

Test (adding missing tests or correcting existing tests)

…Data source grouping This test suite validates the functionality of ContextualPrecisionMetric in handling overlapping chunks, ensuring that source grouping and weighted cumulative precision calculations are correctly implemented.

vercel · 2026-06-20T17:12:28Z

@Ruthwik-Data is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

Ruthwik-Data added 8 commits June 15, 2026 12:22

style: fix prettier formatting in deepseek-model.ts

903fed0

style: fix prettier formatting in kimi-model.ts

af9425c

style: fix prettier formatting in openai-model.ts

f22ad6d

style: fix prettier formatting in openrouter-model.ts

02429a1

style: fix prettier formatting in portkey-model.ts

fdf3058

style: fix prettier line wrapping in kimi-model.ts

63ec06d

Update openai-model.ts

4aa1973

This was referenced Jun 20, 2026

[Bug] ContextualRecallMetric over-penalises overlapping chunks — parallel issue to #2594 (ContextualPrecision) #2788

Open

test(metrics): add overlapping-chunk regression fixtures for ContextualRecallMetric (closes #2788) #2789

Draft

fix: correct recursive import path in cli/test/command.py

2cb7fbf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(metrics): add overlapping-chunk regression fixtures for ContextualPrecisionMetric (rebased on #2743)#2787

test(metrics): add overlapping-chunk regression fixtures for ContextualPrecisionMetric (rebased on #2743)#2787
Ruthwik-Data wants to merge 9 commits into
confident-ai:mainfrom
Ruthwik-Data:test/contextual-precision-overlapping-chunks-v2

Ruthwik-Data commented Jun 20, 2026

Uh oh!

vercel Bot commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Ruthwik-Data commented Jun 20, 2026

Summary

Motivation

What this PR adds

Test fixtures

Test cases

Design notes

Type of change

Uh oh!

vercel Bot commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant