Skip to content

Conversation

@euidong
Copy link

@euidong euidong commented Nov 26, 2025

Description

OpenSearch Benchmark can't dynamically change hnsw_ef_search parameter. With this change, we can change hnsw_ef_search parameter in vector search phase.

Issues Resolved

#994

Testing

  • New functionality includes testing

[Describe how this change was tested]

I tested with following command:

opensearch-benchmark run  \
  --kill-running-processes  \
  --pipeline benchmark-only \
  --results-file ~/vectorsearch_default.log \
  --offline  \
  --target-hosts *** \
  --workload-params ~/wparams.json \
  --workload-path ~/workloads/vectorsearch \
  --test-procedure search-only
  • ~/wparams.json
{
    "target_index_name": "hnsw_benchmark_lucene_m16_efc500",
    "target_index_bulk_index_data_set_path": "~/dataset.h5",
    "target_index_bulk_index_data_set_format": "hdf5",
    "target_field_name": "target_field",
    "target_index_dimension": 128,
    "target_index_space_type": "cosinesimil",
    "query_data_set_format": "hdf5",
    "query_data_set_path": "~/dataset.h5",
    "query_k": 40,
    "query_count": 10000,
    "target_index_bulk_size": 256,
    "target_index_bulk_indexing_clients": 4,
    "search_clients": 16,
    "hnsw_m": 16,
    "hnsw_ef_construction": 500,
    "hnsw_ef_search": 500
}
{
    "name" : "warmup-indices",
    "operation" : "warmup-indices",
    "index": "{{ target_index_name | default(target_index) }}"
},
{
    "operation": {
        "name": "prod-queries",
        "operation-type": "vector-search",
        "index": "{{ target_index_name | default(target_index) }}",
        "detailed-results": true,
        {% if query_k is defined %}
        "k": {{ query_k }},
        {% endif %}
        {% if query_max_distance is defined %}
        "max_distance": {{ query_max_distance }},
        {% endif %}
        {% if query_min_score is defined %}
        "min_score": {{ query_min_score }},
        {% endif %}
        "field" : "{{ target_field_name | default(target_field) }}",
        "data_set_format" : "{{ query_data_set_format | default(hdf5) }}",
        "data_set_path" : "{{ query_data_set_path }}",
        "data_set_corpus" : "{{ query_data_set_corpus }}",
        "neighbors_data_set_path" : "{{ neighbors_data_set_path }}",
        "neighbors_data_set_corpus" : "{{ neighbors_data_set_corpus }}",
        "neighbors_data_set_format" : "{{ neighbors_data_set_format | default(hdf5) }}",
        "num_vectors" : {{ query_count | default(-1) }},
        "id-field-name": "{{ id_field_name }}",
        "body": {{ query_body | default ({}) | tojson }},
        "filter_body": {{ filter_body | default ({}) | tojson }},
        "filter_type": {{filter_type  | default ({}) | tojson }},
        "hnsw_ef_search": {{ hnsw_ef_search }} // 👈 I only add this line
    },
    "clients": {{ search_clients | default(1)}}
}

(Result)

[BEFORE]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
            
|                                                         Metric |           Task |       Value |   Unit |
|---------------------------------------------------------------:|---------------:|------------:|-------:|
|                                                  Mean recall@k |   prod-queries |        0.34 |        |
|                                                  Mean recall@1 |   prod-queries |        0.56 |        |

[AFTER]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
            
|                                                         Metric |           Task |       Value |   Unit |
|---------------------------------------------------------------:|---------------:|------------:|-------:|
|                                                  Mean recall@k |   prod-queries |        0.91 |        |
|                                                  Mean recall@1 |   prod-queries |        0.98 |        |

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • New Features
    • Vector search requests can now include an optional ef_search parameter. When provided, it is included in the query payload so users can tune retrieval behavior—trading search speed for recall—to get finer control over search performance and result quality.
    • This setting is honored automatically for searches that specify ef_search.

✏️ Tip: You can customize this high-level summary in your review settings.

@peterzhuamazon
Copy link
Member

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Nov 26, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link

coderabbitai bot commented Nov 26, 2025

📝 Walkthrough

Walkthrough

Adds a public constant PARAMS_NAME_EF_SEARCH, reads and stores an optional ef_search value in VectorSearchPartitionParamSource during initialization, and injects a method_parameters block containing ef_search into the vector search query body when present.

Changes

Cohort / File(s) Summary
Vector search ef_search parameter support
osbenchmark/workload/params.py
Added public constant PARAMS_NAME_EF_SEARCH = "hnsw_ef_search"; extracted ef_search in __init__ and stored it as self.ef_search; updated _build_vector_search_query_body to conditionally add a method_parameters block with ef_search when self.ef_search is not None.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I sniff the params, find a tiny key,
ef_search glints where vectors roam free,
I hop and tuck it into the query's seam,
Benchmarks whisper, then hum and gleam,
Carrot in paw, I guard the dream. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding the ability to apply the HNSW_EF_SEARCH parameter during vector-search queries, which aligns with the raw summary and PR objectives.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e08af4 and 0aacc3a.

📒 Files selected for processing (1)
  • osbenchmark/workload/params.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • osbenchmark/workload/params.py

Comment @coderabbitai help to get the list of available commands and usage tips.

@euidong euidong force-pushed the apply-dynamic-ef-search-vectorsearch branch from 35d64d7 to 9f62bee Compare November 27, 2025 00:46
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
osbenchmark/workload/params.py (1)

1100-1126: Optional: parse and validate hnsw_ef_search similarly to k for robustness

The new PARAMS_NAME_EF_SEARCH and self.ef_search = params.get(...) wiring is straightforward and keeps the parameter optional. To harden this, consider parsing and validating it (e.g., via parse_int_parameter with a sensible min value) so bad or stringly values are caught early, similar to how k is handled.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6aef287 and 9f62bee.

📒 Files selected for processing (1)
  • osbenchmark/workload/params.py (3 hunks)
🔇 Additional comments (1)
osbenchmark/workload/params.py (1)

1218-1233: Conditional method_parameters.ef_search injection looks correct and backward‑compatible

The method_parameters block is only added when self.ef_search is not None, leaving existing workloads unchanged. When set, it is scoped to the per‑field knn query as expected and coexists cleanly with the existing filter handling and nested‑field wrapping.

@euidong euidong force-pushed the apply-dynamic-ef-search-vectorsearch branch from 9f62bee to 9b59284 Compare November 28, 2025 01:24
@euidong
Copy link
Author

euidong commented Nov 28, 2025

@coderabbitai generate docstrings

@coderabbitai
Copy link

coderabbitai bot commented Nov 28, 2025

✅ Actions performed

Initiated docstring generation; will generate only if new commits exist.

@coderabbitai
Copy link

coderabbitai bot commented Nov 28, 2025

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #997

coderabbitai bot added a commit that referenced this pull request Nov 28, 2025
Docstrings generation was requested by @euidong.

* #995 (comment)

The following files were modified:

* `osbenchmark/workload/params.py`
@euidong euidong force-pushed the apply-dynamic-ef-search-vectorsearch branch from 9b59284 to e4d4624 Compare November 28, 2025 01:25
@euidong
Copy link
Author

euidong commented Nov 28, 2025

@coderabbitai generate docstrings

@coderabbitai
Copy link

coderabbitai bot commented Nov 28, 2025

✅ Actions performed

Initiated docstring generation; will generate only if new commits exist.

coderabbitai bot added a commit that referenced this pull request Nov 28, 2025
Docstrings generation was requested by @euidong.

* #995 (comment)

The following files were modified:

* `osbenchmark/workload/params.py`
Docstrings generation was requested by @euidong.

* opensearch-project#995 (comment)

The following files were modified:

* `osbenchmark/workload/params.py`
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
osbenchmark/workload/params.py (1)

1103-1110: Consider brevity to match codebase style.

The docstring is comprehensive but unusually verbose compared to other __init__ methods in this file. Most param source classes (e.g., SearchParamSource, BulkIndexParamSource) omit __init__ docstrings entirely. While documentation is valuable, consider shortening or deferring to maintain consistency unless this is part of a broader documentation initiative.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4d4624 and 60aac8e.

📒 Files selected for processing (1)
  • osbenchmark/workload/params.py (4 hunks)
🔇 Additional comments (3)
osbenchmark/workload/params.py (3)

1100-1100: LGTM!

The constant declaration follows the established naming convention and improves maintainability.


1227-1237: LGTM!

The docstring accurately documents the new ef_search parameter and its behavior. The update appropriately extends the existing documentation to cover the new functionality.


1243-1246: Logic is correct for optional ef_search parameter.

The conditional injection of method_parameters correctly handles the optional ef_search parameter. The structure aligns with OpenSearch k-NN plugin expectations. However, ensure validation is added per the earlier comment to prevent invalid values from reaching this point.

Signed-off-by: Jeong Eui Dong [ 정의동 ] <[email protected]>
@euidong euidong force-pushed the apply-dynamic-ef-search-vectorsearch branch from b9d70ef to cd2f02b Compare December 9, 2025 10:05
Signed-off-by: Jeong Eui Dong [ 정의동 ] <[email protected]>
Signed-off-by: Jeong Eui Dong [ 정의동 ] <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants