Fast retreival of 8k vectors of dim 1024 #2005
-
Hello, We now stores our embeddings (linked to the node it does the embedding with an edge) in Arcade. Issue is retreival at boot of out software is quite slow. Cypher query match (vector:EMBEDDING)-[:embb]->(targetNode)
return ID(targetNode) as rid, vector as vector Takes 21s for 7692 entries SQL query MATCH {type: EMBEDDING, as: embb}-->{ as: target}
RETURN embb.vector, target.asRID() Takes 21s for 7692 entries profiling of the cypher query returns that :
Returning the vector RID instead of the vector itself obviously boosts the query a lot, (takes less than 1sec) which leads me to believe it's returning the 8k*1024 dim that slows the whole query down.
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 21 replies
-
Maybe the serializer of arrays is not efficient and the cost is just in serializing the result back. Alsomif.you store large records, it could be helpful changing the page size from 65k to 2x or 4x. Could you please provide a test case or even a database with similar data to spin some tests locally? |
Beta Was this translation helpful? Give feedback.
Are these times using ArcadeDB from Python? Or is it just a python code reading writing arrays <-> json?
JSON is not the most optimized format for transferring arrays of numbers. They have to be converted into a string back and forth. Have you tried using Postgres driver instead?