Replies: 4 comments 6 replies
-
|
Hi @creasysee! I'm Dosu and I’m helping the ragflow team. This is a known issue affecting RAGFlow v0.24.0 and recent versions using the Infinity document engine. The most likely cause is the missing What's happening: When chunks are created, they sometimes don't get the Solutions:
Your external test showing 0.4941 similarity confirms the embedding model is working correctly—the issue is with chunk filtering, not embeddings. There's no hardcoded minimum similarity in RAGFlow; setting the threshold to 0 should work once the visibility issue is resolved. To verify this is the issue, you could check if your chunks have the To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
|
I tried set the field it returns: but in doesn't help, the test 'money' -> 'The company provides a special bonus for every hardworking employee today.' doesn't return results. I tried a different values of Similarity threshold and Vector similarity weight, no results. |
Beta Was this translation helpful? Give feedback.
-
Additional informationI've dumped a variable ranks here for a word "bonus", that exists in a source document and got result (truncated): In the same time I've dumped the ranks variable for a word "money" and got a result: Can anyone tell me where to change the code or give any advice on how to implement it in order to get the right result? I need have the vector_similarity value for the word 'money', this was calculated above: 0.4941. Of course, need also other values similarity=0, total, chunks, chunk_id etc... |
Beta Was this translation helpful? Give feedback.
-
Additional informationI've found where in the source code the path of processing diverges. The word 'bonus' goes up to this line, whereas for the word 'money' it only goes up to this line. It occurs because sim_np has a zero lenght. I'll research it later. Can anyone tell me where to change the code or give any advice on how to implement it in order to get the right result? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I've used a simple document with text uploaded to a dataset as file with extension txt:
I've checked, the file was parced and the chank was created:
settings.bmp
I've set Similarity threshold to 0 and Vector similarity weight to 1.00 and used a word 'money for test:
retrieval.bmp
and got an empty result. In the same time I've used script for check the Vector similarity directly on model:
and I've got result:
I've tried the ver 0.24.0 and a nightly build with the same result.
Additional information:
The bge-m3 model was loaded with context window 8K:
Context width was checked by script:
and it returns:
I'll provide additional information that can help resolve the issue. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions