-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Is your feature request related to a problem? Please describe
I can't really move this issue between projects, but will be copy pasting this great suggestion by @nknize and add a little bit more context to it for the issues I'm seeing while attempting to integrate jVector opensearch-project/k-NN#2386
Is your feature request related to a problem?
Core OpenSearch does not support Vector types as a first class field. The correlation engine has a CorrelationVectorFieldMapper that uses Lucene's KNNFloatVectorField but this is in the events-correlation-engine plugin. We could move that field mapper to the core library, but we don't want to fragment between different vector field implementations. So why not move the Lucene HNSW backed vector field and Knn search as a first class field in a core library?
What solution would you like?
A discussion around making vector field type as a first class citizen in core. We've discussed this before in "person" but I don't know if we have an issue around it. I don't think there's a reason to not have Lucene vector fields and HNSW backed KNN search as a core feature and leverage the OpenSearch kNN plugin as an optional accelerator using alternative native options like FAISS or nmslib?
What alternatives have you considered?
Leave as is if there is a compelling reason to keep this base Lucene capability integration in a separate downstream plugin.
Do you have any additional context?
We were trying to extend the k-NN plugin for jVector engine and encountered several issues with the existing approach that convinced us that core would be a better fit for vector types and vector search going forward.
The issues can be enumerated as follows:
- Significant complexity and maintainability issues - Those are caused primarily due to the decision of including native libraries as a whole delegate to index/search functionalities and causing quite a few issues:
a. build and interfaces in the plugin are quite complex and often break. This is primarily due to some of the native libraries not well thought out inclusion of source code dependencies. Also some versions are not backwards compatible etc.
b. Native memory - native memory makes a lot of difficulties to track and analyze performance issues. JVM analysis will have hard time detecting such issues and not all users would like them - Maintainers choice of engines - The KNN plugin maintainers have a clear preference for some engines (e.g. NmsLib, Faiss) while others from other organizations have preference for JVM based engines (e.g. Lucene, jVector). The KNN plugin became the gate keeper of which engines can be included in OpenSearch which is not aligned with making the project easily extendible. At the moment every new engine extension outside of the plugin would have to copy mapping/query logic which will result in divergence.
The above proposal should make new extensions into OpenSearch easier and less contentious. Satisfy different community needs such as:
- Native vs non native
- more or less agile development based on specific requirements of individual engines (e.g. have many local pre-reqs installed or not)
- More engine diversity without redundancy of logic
Metadata
Metadata
Assignees
Labels
Type
Projects
Status