Description
Describe the bug
I'm using OpenSearch Vector Store version 2.17.1 and encountered an issue when creating 10 indices, each containing approximately 4000 .txt documents. These documents vary between physical models and table descriptions. All indices were created using the following configuration:
{
"settings": {
"index": {
"number_of_shards": "2",
"knn.algo_param": {
"ef_search": "512"
},
"knn": "true"
}
},
"mappings": {
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"metadata": {
"properties": {
"source": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"vector_field": {
"type": "knn_vector",
"dimension": 1024,
"method": {
"engine": "nmslib",
"space_type": "l2",
"name": "hnsw",
"parameters": {
"ef_construction": 512,
"m": 16
}
}
}
}
}
}
However, when running the same vector search query against each of these indices, I observed inconsistent results—the same query returned different top documents in some indices, even though they were identically configured.
To work around this issue, I experimented with increasing the ef_search value at query time from the default 512 to 4096 using the following query:
{
"size": k,
"query": {
"knn": {
"vector_field": {
"vector": question,
"k": k,
"method_parameters": {
"ef_search": 4096
}
}
}
}
}
With this change, the search results became consistent and linear across all indices, which is the expected behavior.
Questions:
-
Is this non-linear behavior with the default ef_search of 512 expected?
-
Is increasing ef_search to a high value (e.g., 4096) the correct and recommended approach to ensure consistent top-k results across identically configured indices?
-
Should I expect some level of randomness due to the HNSW algorithm, or is this pointing to an underlying issue?
Please let me know if you need any additional setup information.
Related component
Search
To Reproduce
-
Create 10 (or more) OpenSearch indices using the settings and mappings shown above.
-
Index approximately 4000 .txt documents per index, with content varying between table descriptions and physical models.
-
Run a knn query (with default ef_search: 512) on all indices using the same vector input.
-
Observe the inconsistency in top-k documents returned across the indices.
-
Modify the same query to include ef_search: 4096 in method_parameters.
-
Observe that the results become consistent and repeatable across all indices.
Expected behavior
Given that all indices have identical configurations and similar volumes of data, running the same knn query with the same input vector should yield consistent top-k results across indices, even when using the default ef_search value.
Additional Details
Host/Environment (please complete the following information):
- OS: Linux/Docker
- Version 2.17.1
Metadata
Metadata
Assignees
Type
Projects
Status