Any plans on adding Cosine similartity to the list of metrics? #670
Open
Description
I'm learning and experimenting with using Arraymancer and text embedding.
In python I use SentenceTransformers and Sklearn/KNeighborsClassifier to find closest matches, using the Cosine metric.
It seems like Arraymancer doesn't support Cosine metric. Are there plans on adding it?
I was using kdTree, with euclidean metric and the results were all wrong.
Can Arraymancer help me normalize the text embeddings? this way I can use euclidean metric and get some good results?
here is my code:
import arraymancer
let vectors = read_npy[float64]("title_vectors.txt.npy")
echo vectors.shape
# [1226242, 350]
let kd = kdtree(vectors)
let (dist,ix) = kd.query(vectors[0,_].reshape(350), k = 3 ) # find closest to first entry
Another thing I am confused about, is why I need to reshape(350)
When I tried: let (dist,ix) = kd.query(vectors[0,_], k = 3 )
it resulted in: Broadcasting error: non-singleton dimensions must be the same in both tensors.
Thanks
Metadata
Assignees
Labels
No labels