-
Notifications
You must be signed in to change notification settings - Fork 60
arroy for vectors #2074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
arroy for vectors #2074
Conversation
"embedding": [1.0, 2.0], | ||
}, | ||
], | ||
}, | ||
"vectorisedGraph": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to move this api inside of graph -- which would fail if the graphs aren't vectorised
selection.expand_entities_by_similarity("node1", 100, (20, 100)) | ||
contents = [doc.content for doc in selection.get_documents()] | ||
assert contents == ["node1", "edge1", "node2", "edge2", "node3"] | ||
assert contents == ["node1", "edge1", "node2"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did these have to be deleted?
@@ -229,6 +206,7 @@ def test_default_template(): | |||
|
|||
vg = g.vectorise(constant_embedding) | |||
|
|||
node_docs = vg.entities_by_similarity(query="whatever", limit=10).get_documents() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be removed?
@@ -27,6 +27,8 @@ use pyo3::{ | |||
types::{PyFunction, PyList}, | |||
}; | |||
|
|||
type DynamicVectorisedGraph = VectorisedGraph<DynamicGraph>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want the Graph storage to have a vectorstore inside of it in the same manner that it has an index
|
||
use super::embeddings::EmbeddingFunction; | ||
|
||
const MAX_DISK_ITEMS: usize = 1_000_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For next PR - can we make these confirgurable?
&self, | ||
ctx: &Context<'_>, | ||
query: String, | ||
limit: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to revamp how algorithms work in graphql
.unwrap_or(Arc::new(None)); | ||
|
||
GraphWithVectors::read_from_folder(folder, embedding, cache, self.create_index) | ||
.unwrap_or_else(|| VectorCache::in_memory(openai_embedding)); // TODO: review, this is weird... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanna have another look at this?
window: Option<Window>, | ||
) -> GraphResult<GqlVectorSelection> { | ||
let vector = ctx.embed_query(query).await?; | ||
let w = window.into_window_tuple(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we stick some spawn blockings around these
|
||
#[ResolvedObjectFields] | ||
impl GqlVectorSelection { | ||
async fn nodes(&self) -> Vec<Node> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most of these will need a spawn blocking
window: Option<Window>, | ||
) -> GraphResult<GqlVectorSelection> { | ||
let vector = ctx.embed_query(query).await?; | ||
let w = window.into_window_tuple(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note here for us to add spawn blocking when we redo algorithms
What changes were proposed in this pull request?
Why are the changes needed?
The old approach of defining custom dcument search queries in python had two problems:
Does this PR introduce any user-facing change? If yes is this documented?
How was this patch tested?
Are there any further changes required?