Closed
Description
Describe the bug
I have a custom procedure with sleep operation after each major operation. Here is an example procedure -
{% import "benchmark.helpers" as benchmark with context %}
{
"name": "knn-no-train-test",
"default": false,
"schedule": [
{
"operation": {
"name": "delete-target-index",
"operation-type": "delete-index",
"only-if-exists": true,
"index": "{{ target_index_name }}"
}
},
{
"operation": {
"name": "wait-a-minute-or-more-1",
"operation-type": "sleep",
"duration": {{ wait_time_in_sec }}
}
},
{
"operation": {
"name": "create-target-index",
"operation-type": "create-index",
"index": "{{ target_index_name }}"
}
},
{
"operation": {
"name": "wait-a-minute-or-more-2",
"operation-type": "sleep",
"duration": {{ wait_time_in_sec }}
}
},
{
"operation": {
"name": "custom-vector-bulk",
"operation-type": "custom-vector-bulk",
"index": "{{ target_index_name }}",
"field": "{{ target_field_name }}",
"bulk_size": {{ target_index_bulk_size }},
"data_set_format": "{{ target_index_bulk_index_data_set_format }}",
"data_set_path": "{{ target_index_bulk_index_data_set_path }}",
"num_vectors": {{ target_index_num_vectors }},
"timeout": "{{ timeout }}"
},
"clients": {{ target_index_bulk_index_clients }}
},
{
"operation": {
"name": "wait-a-minute-or-more-3",
"operation-type": "sleep",
"duration": {{ wait_time_in_sec }}
}
}
]
}
When i run this -
____ _____ __ ____ __ __
/ __ \____ ___ ____ / ___/___ ____ ___________/ /_ / __ )___ ____ _____/ /_ ____ ___ ____ ______/ /__
/ / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \ / __ / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ / __/ / / /__/ / __/ /_/ / / / /__/ / / / / /_/ / __/ / / / /__/ / / / / / / / / /_/ / / / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/ \___/_/ /_/ /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/ /_/|_|
/_/
[INFO] [Test Execution ID]: b07200c1-b031-4fd7-b6ab-c379ceb4c40e
[WARNING] Local changes in [/Users/gramadur/.benchmark/benchmarks/workloads/default] prevent workloads update from remote. Please commit your changes.
[WARNING] Local changes in [/Users/gramadur/.benchmark/benchmarks/workloads/default] prevent workloads update from remote. Please commit your changes.
[WARNING] Local changes in [/Users/gramadur/.benchmark/benchmarks/workloads/default] prevent workloads update from remote. Please commit your changes.
[INFO] Executing test with workload [vectorsearch], test_procedure [knn-no-train-test] and provision_config_instance ['external'] with version [None].
Running delete-target-index [100% done]
Running wait-a-minute-or-more-1 [ 0% done]
[ERROR] Cannot execute-test. Benchmark operation failed:
Worker ID: 0
Task: wait-a-minute-or-more-1
Workload: vectorsearch
Test Procedure: knn-no-train-test
Cause: Unknown
Error: Cannot run task [wait-a-minute-or-more-1]: 'request_start'
Cannot run task [wait-a-minute-or-more-1]: 'request_start'
To reproduce
To Reproduce -
I think any of the procedures, if you add a sleep operation like above would lead to this issue.
{
"operation": {
"name": "wait-a-minute-or-more-1",
"operation-type": "sleep",
"duration": {{ wait_time_in_sec }}
}
},
I have provided an example procedure above for an aoss collection.
For example - you can add the above operation to test_procedures/common/index-only-schedule.json
and run the aoss 1m cohere workload -
opensearch-benchmark execute-test --workload=vectorsearch --target-hosts=<ENDPOINT> --client-options="timeout:600,use_ssl:True,verify_certs:True,amazon_aws_log_in:environment,region:us-east-1" --workload-params=faiss-cohere-768-dp.json --test-procedure=no-train-test-index-only --pipeline=benchmark-only --kill-running-processes --distribution-version=2.17.0
It will fail at the sleep operation that was just added
Expected behavior
Expected behavior - the sleep runner completes successfully by performing sleep for provided duration seconds.
Screenshots
If applicable, add screenshots to help explain your problem.
Host / Environment
No response
Additional context
And the benchmarking logs show -
2025-03-04 17:35:17,245 ActorAddr-(T|:59907)/PID:69727 osbenchmark.worker_coordinator.worker_coordinator ERROR Could not execute schedule
Traceback (most recent call last):
File "/Users/gramadur/Library/Python/3.9/lib/python/site-packages/osbenchmark/worker_coordinator/worker_coordinator.py", line 1674, in __call__
request_start = request_context.request_start
File "/Users/gramadur/Library/Python/3.9/lib/python/site-packages/osbenchmark/client.py", line 54, in request_start
return self.ctx["request_start"]
KeyError: 'request_start'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/gramadur/Library/Python/3.9/lib/python/site-packages/osbenchmark/worker_coordinator/worker_coordinator.py", line 1677, in __call__
client_request_end = request_context.client_request_end
File "/Users/gramadur/Library/Python/3.9/lib/python/site-packages/osbenchmark/client.py", line 72, in __aexit__
request_start = self.request_start
File "/Users/gramadur/Library/Python/3.9/lib/python/site-packages/osbenchmark/client.py", line 54, in request_start
return self.ctx["request_start"]
KeyError: 'request_start'
2025-03-04 17:35:17,265 ActorAddr-(T|:59907)/PID:69727 osbenchmark.worker_coordinator.worker_coordinator INFO Total run duration: 3.022682 seconds.
2025-03-04 17:35:17,269 ActorAddr-(T|:59907)/PID:69727 osbenchmark.worker_coordinator.worker_coordinator INFO Total time to shutdown asyncgens: 0.003797 seconds.
2025-03-04 17:35:17,269 ActorAddr-(T|:59907)/PID:69727 osbenchmark.worker_coordinator.worker_coordinator INFO Total time to close transports: -0.000291 seconds.
2025-03-04 17:35:19,223 ActorAddr-(T|:59907)/PID:69727 osbenchmark.actor ERROR Worker[0] has detected a benchmark failure:
Benchmark operation failed:
Worker ID: 0
Task: wait-a-minute-or-more-1
Workload: vectorsearch
Test Procedure: knn-no-train-test
Cause: Unknown
Error: Cannot run task [wait-a-minute-or-more-1]: 'request_start'
Traceback (most recent call last):
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/gramadur/Library/Python/3.9/lib/python/site-packages/osbenchmark/worker_coordinator/worker_coordinator.py", line 1514, in __call__
loop.run_until_complete(self.run())
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/Users/gramadur/Library/Python/3.9/lib/python/site-packages/osbenchmark/worker_coordinator/worker_coordinator.py", line 1561, in run
_ = await asyncio.gather(*aws)
File "/Users/gramadur/Library/Python/3.9/lib/python/site-packages/osbenchmark/worker_coordinator/worker_coordinator.py", line 1725, in __call__
raise exceptions.BenchmarkError(f"Cannot run task [{self.task}]: {e}") from None
osbenchmark.exceptions.BenchmarkError: Cannot run task [wait-a-minute-or-more-1]: 'request_start'