The top n setting does not take effect #7896

chaijl · 2025-05-28T00:29:55Z

chaijl
May 28, 2025

”Not all blocks with similarity scores above the 'similarity threshold' will be provided to the large language model. LLM can only see these 'Top N' blocks.“
The meaning of 'top n' in the document is that the first N search results will be passed to the big model, with a default setting of 8. However, I found that after connecting to the Deepseek big model, feedback from its inference indicates that it will summarize and generalize from the first 4 pieces of information. Is this a control of the big model or a limitation of Ragflow?

2025-05-28T00:31:45Z

dosubot[bot]
bot May 28, 2025

Based on the available information, the 'top n' setting in RAGFlow is designed to control how many of the most relevant chunks (blocks) are passed to the large language model (LLM), such as Deepseek. This setting is configurable and determines the maximum number of chunks the LLM can see, regardless of how many are retrieved above the similarity threshold. For example, if 'top_n' is set to 8, only the top 8 chunks should be provided to the LLM, even if more are available. The default value for 'top_n' can vary by component, but is often 6 or 8 in configuration files and documentation [source 1] [source 2].

There is no evidence in the RAGFlow codebase or documentation that the Deepseek big model itself imposes a stricter limit or overrides the 'top n' setting from RAGFlow. The configuration file does show that Deepseek is used as the LLM, but the chunk selection logic is handled by RAGFlow before the data is sent to the model [source 3].

If you are observing that only the first 4 pieces of information are being summarized by Deepseek, this is not a documented or expected limitation of RAGFlow or the Deepseek integration. Possible causes could include: a misconfiguration of the 'top_n' parameter, a bug in the pipeline, or a behavior in the Deepseek model's summarization logic (for example, it may choose to focus on fewer chunks even if more are provided). However, there is no direct evidence that Deepseek is limiting the input to 4 chunks on its own.

You may want to double-check your RAGFlow configuration to ensure 'top_n' is set as intended, and verify (via logs or debugging) how many chunks are actually being sent to the Deepseek model. If the configuration is correct and the issue persists, it could be worth investigating how the Deepseek model processes its input, but based on current documentation, RAGFlow should respect the 'top_n' setting when preparing input for the LLM.

To reply, just mention my name @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

0 replies

chaijl · 2025-06-04T09:37:26Z

chaijl
Jun 4, 2025
Author

In my actual testing, if a document block has a large amount of content, I tested about a block with 5000 Chinese characters, and it only passed the content of the first 2 blocks to the large model. I don't know which side has restrictions in this situation

0 replies

ZhenhangTung · 2025-06-13T02:51:46Z

ZhenhangTung
Jun 13, 2025
Collaborator

@chaijl which version are you using?

1 reply

chaijl Jun 18, 2025
Author

I am using the latest version 0.19, which I estimate can only accept around 10000 Chinese characters at once for the large model. This is also the limitation of Rag, which I currently think is insufficient. I originally wanted to use the daily data of the enterprise as a knowledge base, but in use, sometimes I want to search for data for a year, sometimes I want to search for data for a period of time, and sometimes I want to count data for a month. It is difficult for me to handle document partitioning. When dividing data by day and counting months or periods of time, Rag cannot retrieve all blocks because its top maximum value is 30. When I put the annual, monthly, and daily data into the same block, Rag retrieved them, but LLM cannot accept it. The top 8 retrieved The data, because one block of text has 5000 Chinese characters, if there are 8, it will be 40000. I don't know what better way to solve my problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InfiniFlow

The top n setting does not take effect #7896

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

InfiniFlow

The top n setting does not take effect #7896

Uh oh!

chaijl May 28, 2025

Replies: 3 comments · 1 reply

Uh oh!

dosubot[bot] bot May 28, 2025

Uh oh!

chaijl Jun 4, 2025 Author

Uh oh!

ZhenhangTung Jun 13, 2025 Collaborator

Uh oh!

chaijl Jun 18, 2025 Author

chaijl
May 28, 2025

Replies: 3 comments 1 reply

dosubot[bot]
bot May 28, 2025

chaijl
Jun 4, 2025
Author

ZhenhangTung
Jun 13, 2025
Collaborator

chaijl Jun 18, 2025
Author