It appears that spark-knn needs to transform dense vectors into their sparse form. This creates a limitation when using spark-knn for very wide, sparse datasets such as document-term matrices used in NLP.
Is there any interest in supporting sparse vectors within spark-knn?