Context Precision score heavily influenced by position of relevant contexts in array

[ ] I have checked the [documentation](https://docs.ragas.io/) and related resources and couldn't resolve my bug.

**Describe the bug**
ContextPrecision evaluator in RAGAS produces inconsistent scores that are strongly influenced by the position of relevant contexts in the input array. When a relevant context appears as the first element in the retrieved_contexts array, the score is artificially inflated (often >0.9), even when most contexts are irrelevant and when the score gradually decreases if the relevant context's position moves ahead.

Ragas version:0.2.14
Python version:3.11

**Code to Reproduce**

The below test fails with the score of 0.9999999999

```
def test_context_precision():
    sample = SingleTurnSample(
        reference="The capital of France is Paris.",
        retrieved_contexts=[
            "The capital of France is Paris",
            "Bahmni is a comprehensive, easy-to-use, and fully open-source Hospital Information System (HIS)",
            "it suitable for a wide range of healthcare facilities, from small clinics to large hospitals.",
            "A resilient EMR and hospital management system built on reliable open-source components.",

        ],
        user_input="What is the capital of France?",
    )

    context_precision = ContextPrecision(llm=evaluator_llm)
    score = context_precision.single_turn_score(sample)
    print("Context Precision score: ", score)
    assert score < 0.3
```


WHEREAS 

The below test passes. Here I have just changed the position of the relevant context as the last element.

```
def test_context_precision():
    sample = SingleTurnSample(
        reference="The capital of France is Paris.",
        retrieved_contexts=[
            "Bahmni is a comprehensive, easy-to-use, and fully open-source Hospital Information System (HIS)",
            "it suitable for a wide range of healthcare facilities, from small clinics to large hospitals.",
            "A resilient EMR and hospital management system built on reliable open-source components.",
            "The capital of France is Paris",
        ],
        user_input="What is the capital of France?",
    )

    context_precision = ContextPrecision(llm=evaluator_llm)
    score = context_precision.single_turn_score(sample)
    print("Context Precision score: ", score)
    assert score < 0.3
```

Also,
When the relevant context is in second position, the score is 0.49999999995
When it is is in third position, the score is 0.3333333333


**Error trace**
Not applicable.

**Expected behavior**
The score should not be affected by the position of relevant context.

**Additional context**
The issue is consistent across the models Open AI GPT 4o, AWS Bedrock Claude Sonnet 3.7 and Azure OpenAI GPT4o. This position bias severely impacts the reliability of the Context Precision metric for evaluating RAG systems. In real-world applications, the ordering of retrieved contexts should not affect the precision score, as the goal is to measure how many of the retrieved contexts are actually relevant to the question.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Context Precision score heavily influenced by position of relevant contexts in array #2013

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Context Precision score heavily influenced by position of relevant contexts in array #2013

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions