Skip to content

Any plans on adding Cosine similartity to the list of metrics? #670

Open
@greenersharp

Description

I'm learning and experimenting with using Arraymancer and text embedding.

In python I use SentenceTransformers and Sklearn/KNeighborsClassifier to find closest matches, using the Cosine metric.

It seems like Arraymancer doesn't support Cosine metric. Are there plans on adding it?
I was using kdTree, with euclidean metric and the results were all wrong.

Can Arraymancer help me normalize the text embeddings? this way I can use euclidean metric and get some good results?

here is my code:

import arraymancer

let vectors = read_npy[float64]("title_vectors.txt.npy")

echo vectors.shape
# [1226242, 350]

let kd = kdtree(vectors)
let (dist,ix) =  kd.query(vectors[0,_].reshape(350), k = 3 )  # find closest to first entry

Another thing I am confused about, is why I need to reshape(350)
When I tried: let (dist,ix) = kd.query(vectors[0,_], k = 3 ) it resulted in: Broadcasting error: non-singleton dimensions must be the same in both tensors.

Thanks

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions