Skip to content

[Bug]: Inconsistent and Incomplete Document Counting/Retrieval for CIO Contract Documents #1648

@ricofurtado

Description

@ricofurtado

OpenRAG Version

Tech-preview

Deployment Method

Other

Operating System

all

Python Version

3.12

Affected Area

Chat (chat interface, conversations, AI responses)

Bug Description

Description

When asking OpenRAG chat questions about the Canidium contract documents, the responses are inconsistent and inaccurate.

For the question:

"How many documents do we have with Canidium?"

OpenRAG sometimes returns 10 and other times 28, while the expected answer is 5 documents.

For the question:

"List all the documents"

OpenRAG returns only 2 or 3 documents out of the expected 5. It appears that OpenRAG is treating the contract amendments as part of the same contract instead of recognizing them as separate documents.

Expected Behavior

OpenRAG should consistently identify and return the correct number of Canidium-related documents, which is 5.

When asked to list all documents, OpenRAG should return all 5 documents, including the amendments as separate documents when they were ingested as separate files.

Actual Behavior

OpenRAG provides inconsistent document counts and incomplete document lists.

Question Actual Result Expected Result
"How many documents do we have with Canidium?" Sometimes 10, sometimes 28 5
"List all the documents" 2 or 3 documents 5 documents

Impact

This causes unreliable chat responses for document inventory questions and may reduce user trust in OpenRAG’s ability to correctly reason over ingested document collections, especially when contracts and amendments are involved.

Notes

The issue may be related to how OpenRAG groups or interprets contract amendments during retrieval or response generation. Amendments appear to be considered part of the same contract rather than being counted and listed as separate ingested documents.



### Steps to Reproduce

1- Load the 5 Canidium documents to Openrag

2- chat prompt: "How many documents do we have with Canidium?"

OpenRAG sometimes returns 10 and other times 28, while the expected answer is 5 documents.

For the question:

"List all the documents"

OpenRAG returns only 2 or 3 documents out of the expected 5. It appears that OpenRAG is treating the contract amendments as part of the same contract instead of recognizing them as separate documents.

### Expected Behavior

It should return 5 documents for the first question

### Actual Behavior

It seems it is returning the number of chunks for the first question

### Relevant Logs

```shell

Screenshots

No response

Additional Context

No response

Checklist

  • I have searched existing issues to ensure this bug hasn't been reported before.
  • I have provided all the requested information.

Metadata

Metadata

Assignees

Labels

bug🔴 Something isn't working.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions