Open
Description
I am encountering limitations with RAGLite when evaluating using RAGAS. The main issue is that the RAG pipeline configuration differs between inference and evaluation, which impacts consistency and reproducibility.
The differences include:
1. System prompt
2. Instruction prompt (where the context is added to a user prompt)
3. Number of chunks to retrieve
4. Number of chunk spans to retrieve
Questions
- Does it make sense to include these differences as part of a unified configuration to ensure alignment?
- Or should we explore an alternative mechanism to address this inconsistency?
Metadata
Metadata
Assignees
Labels
No labels