Inconsistent Diff Query Results / Incorrect Euclidean Distance Calculation for Chunk Selection #183
Unanswered
emanuel-skai
asked this question in
Q&A
Replies: 1 comment
-
|
Hey @emanuel-skai 👋 The implementation on the nilAI repo here will probably be a very good resource in this case, as it demonstrates this whole flow with steps (essentially very close to mirroring the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm following the nilrag examples to upload data to nilDB and retrieve the document with the closest distance to my query. However, I'm observing inconsistent results on each execution of my custom client script.
From the examples, it isn’t clear to me how the nilai_chat_completion method uses the nilrag payload to extend the LLM context with the closest document data. In my use case, I need to run a custom differences query that includes specific filters (such as user and agent IDs). For that reason, I am using the diff_query_execute and chunk_query_execute methods to manually find the document with the closest distance to my query, then retrieve and decode the text chunks.
Below is the complete client code I’ve written:
Issue:
Although my expected output for the query prompt "Who is Danielle Miller?" should return the chunk with ID
6efb4de5-1eb0-4aba-b76d-c409f5220a81 (which contains the text:"Danielle Miller works at Bailey and Sons as a Engineer, mining. Danielle Miller was born on 2007-10-22 and lives at 61586 Michael Greens, New Holly, CO 29872."),
I am observing inconsistent results between executions.
For instance:
which corresponds to the expected output.
I have verified that the client query example consistently returns the correct result. Therefore, it appears that there is an issue with my custom client script logic.
Request for Assistance:
Can you please review the client logic above and help me identify any potential issues that could be causing these non‑deterministic results? In particular:
Is my approach for reconstructing the difference vector and computing the Euclidean norm correct?
Are there any issues with how I'm aggregating or grouping the secret shares from the diff query?
Could the inconsistent results be caused by variations in the secret sharing or encryption/decryption process?
Any insights or suggestions to resolve the non‑consistent behavior would be greatly appreciated.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions