Skip to content

Faster ANN and bug fix #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

ebecht
Copy link

@ebecht ebecht commented Jun 6, 2019

Hi

I've switched the ANN search to the HNSW library which is faster.

I've also fixed a rare bug where a data point could disappear from the output (if all of its nearest neighbor had no shared nearest neighbor with it and if no point from which it is a nearest neighbor had a common nearest neighbor with it. Happened once in a 3,000,000+ dataset but I remember having encountered that bug before).

Would be good to a least merge the bug fix! It corresponds to the following code snippet from the phenograph.R file

links <- links[links[,1]>0, ]

## Fix if data point goes missing (due to all of its associated jaccard coefficients being 0 and if it ever appears as another points' nearest neighbor, the corresponding jaccard coeficient also being 0.
    u = unique(c(links[,1],links[,2]))
    u = setdiff(1:nrow(data),u) ## Check if data point has no link
    links=rbind(links,matrix(ncol=3,byrow=FALSE,data=c(u,u,rep(1,length(u)))))

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant