About document retrieval scope in ChainRAG: Why does each question use a separate small corpus instead of a global corpus?

Hi, thanks for the great work and the released code!
While reproducing the ChainRAG experiments, I encountered a design choice that I am not fully sure I understand, and I hope you can clarify it.


When examining the retrieval module in ChainRAG, I noticed that:

Each question is associated with its own small set of documents
(usually ~10 passages or paragraphs)

**Retrieval is performed only within this per-question mini-corpus,
instead of performing retrieval from a global shared corpus covering all the documents for all questions.**

This means that for question i, the retriever only searches within the documents that belong to question i, rather than searching over the full dataset.

🤔 **My Question / Confusion**

I would like to understand the motivation for this design choice.

Specifically:

Why does ChainRAG restrict retrieval to “the documents related to that single question” instead of using a unified global corpus for all questions?

Some possible reasons I considered (but I may be mistaken):

Was this done to match the setting of the original dataset?

To avoid cross-question document contamination?

To reduce retrieval noise?

For computational efficiency?

Or is this simply a simplified experimental setting for fair comparison with baselines?

Right now, it feels more like oracle retrieval, since the system already knows the candidate documents per question in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

About document retrieval scope in ChainRAG: Why does each question use a separate small corpus instead of a global corpus? #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About document retrieval scope in ChainRAG: Why does each question use a separate small corpus instead of a global corpus? #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions