-
Notifications
You must be signed in to change notification settings - Fork 115
Add: apply HNSW_EF_SEARCH paramter when run vector-search query #995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add: apply HNSW_EF_SEARCH paramter when run vector-search query #995
Conversation
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughAdds a public constant Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Comment |
35d64d7 to
9f62bee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
osbenchmark/workload/params.py (1)
1100-1126: Optional: parse and validatehnsw_ef_searchsimilarly tokfor robustnessThe new
PARAMS_NAME_EF_SEARCHandself.ef_search = params.get(...)wiring is straightforward and keeps the parameter optional. To harden this, consider parsing and validating it (e.g., viaparse_int_parameterwith a sensible min value) so bad or stringly values are caught early, similar to howkis handled.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
osbenchmark/workload/params.py(3 hunks)
🔇 Additional comments (1)
osbenchmark/workload/params.py (1)
1218-1233: Conditionalmethod_parameters.ef_searchinjection looks correct and backward‑compatibleThe
method_parametersblock is only added whenself.ef_searchis notNone, leaving existing workloads unchanged. When set, it is scoped to the per‑fieldknnquery as expected and coexists cleanly with the existing filter handling and nested‑field wrapping.
9f62bee to
9b59284
Compare
|
@coderabbitai generate docstrings |
✅ Actions performedInitiated docstring generation; will generate only if new commits exist. |
|
Note Docstrings generation - SUCCESS |
Signed-off-by: Jeong Eui Dong [ 정의동 ] <[email protected]>
Docstrings generation was requested by @euidong. * #995 (comment) The following files were modified: * `osbenchmark/workload/params.py`
9b59284 to
e4d4624
Compare
|
@coderabbitai generate docstrings |
✅ Actions performedInitiated docstring generation; will generate only if new commits exist. |
Docstrings generation was requested by @euidong. * #995 (comment) The following files were modified: * `osbenchmark/workload/params.py`
Docstrings generation was requested by @euidong. * opensearch-project#995 (comment) The following files were modified: * `osbenchmark/workload/params.py`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
osbenchmark/workload/params.py (1)
1103-1110: Consider brevity to match codebase style.The docstring is comprehensive but unusually verbose compared to other
__init__methods in this file. Most param source classes (e.g.,SearchParamSource,BulkIndexParamSource) omit__init__docstrings entirely. While documentation is valuable, consider shortening or deferring to maintain consistency unless this is part of a broader documentation initiative.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
osbenchmark/workload/params.py(4 hunks)
🔇 Additional comments (3)
osbenchmark/workload/params.py (3)
1100-1100: LGTM!The constant declaration follows the established naming convention and improves maintainability.
1227-1237: LGTM!The docstring accurately documents the new
ef_searchparameter and its behavior. The update appropriately extends the existing documentation to cover the new functionality.
1243-1246: Logic is correct for optional ef_search parameter.The conditional injection of
method_parameterscorrectly handles the optionalef_searchparameter. The structure aligns with OpenSearch k-NN plugin expectations. However, ensure validation is added per the earlier comment to prevent invalid values from reaching this point.
Signed-off-by: Jeong Eui Dong [ 정의동 ] <[email protected]>
b9d70ef to
cd2f02b
Compare
Signed-off-by: Jeong Eui Dong [ 정의동 ] <[email protected]>
Signed-off-by: Jeong Eui Dong [ 정의동 ] <[email protected]>
Description
OpenSearch Benchmark can't dynamically change
hnsw_ef_searchparameter. With this change, we can changehnsw_ef_searchparameter in vector search phase.Issues Resolved
#994
Testing
[Describe how this change was tested]
I tested with following command:
{ "target_index_name": "hnsw_benchmark_lucene_m16_efc500", "target_index_bulk_index_data_set_path": "~/dataset.h5", "target_index_bulk_index_data_set_format": "hdf5", "target_field_name": "target_field", "target_index_dimension": 128, "target_index_space_type": "cosinesimil", "query_data_set_format": "hdf5", "query_data_set_path": "~/dataset.h5", "query_k": 40, "query_count": 10000, "target_index_bulk_size": 256, "target_index_bulk_indexing_clients": 4, "search_clients": 16, "hnsw_m": 16, "hnsw_ef_construction": 500, "hnsw_ef_search": 500 }{ "name" : "warmup-indices", "operation" : "warmup-indices", "index": "{{ target_index_name | default(target_index) }}" }, { "operation": { "name": "prod-queries", "operation-type": "vector-search", "index": "{{ target_index_name | default(target_index) }}", "detailed-results": true, {% if query_k is defined %} "k": {{ query_k }}, {% endif %} {% if query_max_distance is defined %} "max_distance": {{ query_max_distance }}, {% endif %} {% if query_min_score is defined %} "min_score": {{ query_min_score }}, {% endif %} "field" : "{{ target_field_name | default(target_field) }}", "data_set_format" : "{{ query_data_set_format | default(hdf5) }}", "data_set_path" : "{{ query_data_set_path }}", "data_set_corpus" : "{{ query_data_set_corpus }}", "neighbors_data_set_path" : "{{ neighbors_data_set_path }}", "neighbors_data_set_corpus" : "{{ neighbors_data_set_corpus }}", "neighbors_data_set_format" : "{{ neighbors_data_set_format | default(hdf5) }}", "num_vectors" : {{ query_count | default(-1) }}, "id-field-name": "{{ id_field_name }}", "body": {{ query_body | default ({}) | tojson }}, "filter_body": {{ filter_body | default ({}) | tojson }}, "filter_type": {{filter_type | default ({}) | tojson }}, "hnsw_ef_search": {{ hnsw_ef_search }} // 👈 I only add this line }, "clients": {{ search_clients | default(1)}} }(Result)
[BEFORE]
------------------------------------------------------ _______ __ _____ / ____(_)___ ____ _/ / / ___/_________ ________ / /_ / / __ \/ __ `/ / \__ \/ ___/ __ \/ ___/ _ \ / __/ / / / / / /_/ / / ___/ / /__/ /_/ / / / __/ /_/ /_/_/ /_/\__,_/_/ /____/\___/\____/_/ \___/ ------------------------------------------------------ | Metric | Task | Value | Unit | |---------------------------------------------------------------:|---------------:|------------:|-------:| | Mean recall@k | prod-queries | 0.34 | | | Mean recall@1 | prod-queries | 0.56 | |[AFTER]
------------------------------------------------------ _______ __ _____ / ____(_)___ ____ _/ / / ___/_________ ________ / /_ / / __ \/ __ `/ / \__ \/ ___/ __ \/ ___/ _ \ / __/ / / / / / /_/ / / ___/ / /__/ /_/ / / / __/ /_/ /_/_/ /_/\__,_/_/ /____/\___/\____/_/ \___/ ------------------------------------------------------ | Metric | Task | Value | Unit | |---------------------------------------------------------------:|---------------:|------------:|-------:| | Mean recall@k | prod-queries | 0.91 | | | Mean recall@1 | prod-queries | 0.98 | |By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.