Skip to content

Comments

Support benchmark for polling ingestion#784

Merged
rishabh6788 merged 15 commits intoopensearch-project:mainfrom
xuxiong1:osb-ig
Apr 2, 2025
Merged

Support benchmark for polling ingestion#784
rishabh6788 merged 15 commits intoopensearch-project:mainfrom
xuxiong1:osb-ig

Conversation

@xuxiong1
Copy link
Contributor

@xuxiong1 xuxiong1 commented Mar 5, 2025

Description

This PR supports polling ingestion benchmarking, which reads from the workload and produces the message to a stream.

  • Added a new operation-type: ProduceStreamMessage
  • The new ProduceStreamMessage(Runner) reuses the existing BulkIndexParamSource to read the workload
  • Added a MessageProducerFactory which creates the stream producer.
  • Currently only support run in benchmark-only mode.
  • Added shard-stats telemetry device endpoint

Issues Resolved

related to issue: opensearch-project/OpenSearch#17086
related to PR: opensearch-project/opensearch-benchmark-workloads#578

Testing

  • New functionality includes testing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

xuxiong1 added 5 commits March 4, 2025 07:31
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
@xuxiong1
Copy link
Contributor Author

xuxiong1 commented Mar 5, 2025

Local E2E test works:

.venvxux-java-data% opensearch-benchmark execute-test --workload-repository=default --workload-revision=9f45b3683c01c0a2979f7f0edf3566362d54bbce --pipeline=benchmark-only --workload=geonames --target-host=127.0.0.1:9200 --workload-params '{"ingest_percentage":"1","bulk_size":"100","number_of_shards":"1","number_of_replicas":"0","ingestion_source_type": "kafka","ingestion_topic":"test","ingestion_bootstrap_servers":"localhost:34803"}' --test-procedure=polling-ingest-index-only --kill-running-processes

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

[INFO] [Test Execution ID]: 273ade57-939e-4b2a-88a5-d98c2d36de45
[INFO] Downloading workload data file: documents-2.json.bz2 (252.9 MB total size) [100.0%]
[INFO] Decompressing workload data from [/home/user/.osb/benchmarks/data/geonames/documents-2.json.bz2] to [/home/user/.osb/benchmarks/data/geonames/documents-2.json] (resulting size: [3.30] GB) ... [OK]
[INFO] Downloading workload data file: documents-2.json.offset (4.2 kB total size)[100.0%]
[INFO] Executing test with workload [geonames], test_procedure [polling-ingest-index-only] and provision_config_instance ['external'] with version [3.0.0-SNAPSHOT].

[WARNING] refresh_total_time is 49 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running check-cluster-health                                                   [100% done]
Running polling-ingest                                                         [100% done]
Running refresh-after-index                                                    [100% done]
Running force-merge                                                            [100% done]
Running refresh-after-force-merge                                              [100% done]
Running wait-until-merges-finish                                               [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
            
|                                                         Metric |                     Task |     Value |   Unit |
|---------------------------------------------------------------:|-------------------------:|----------:|-------:|
|                     Cumulative indexing time of primary shards |                          |         0 |    min |
|             Min cumulative indexing time across primary shards |                          |         0 |    min |
|          Median cumulative indexing time across primary shards |                          |         0 |    min |
|             Max cumulative indexing time across primary shards |                          |         0 |    min |
|            Cumulative indexing throttle time of primary shards |                          |         0 |    min |
|    Min cumulative indexing throttle time across primary shards |                          |         0 |    min |
| Median cumulative indexing throttle time across primary shards |                          |         0 |    min |
|    Max cumulative indexing throttle time across primary shards |                          |         0 |    min |
|                        Cumulative merge time of primary shards |                          |    0.0088 |    min |
|                       Cumulative merge count of primary shards |                          |         1 |        |
|                Min cumulative merge time across primary shards |                          |    0.0088 |    min |
|             Median cumulative merge time across primary shards |                          |    0.0088 |    min |
|                Max cumulative merge time across primary shards |                          |    0.0088 |    min |
|               Cumulative merge throttle time of primary shards |                          |         0 |    min |
|       Min cumulative merge throttle time across primary shards |                          |         0 |    min |
|    Median cumulative merge throttle time across primary shards |                          |         0 |    min |
|       Max cumulative merge throttle time across primary shards |                          |         0 |    min |
|                      Cumulative refresh time of primary shards |                          |   0.04715 |    min |
|                     Cumulative refresh count of primary shards |                          |        20 |        |
|              Min cumulative refresh time across primary shards |                          |   0.04715 |    min |
|           Median cumulative refresh time across primary shards |                          |   0.04715 |    min |
|              Max cumulative refresh time across primary shards |                          |   0.04715 |    min |
|                        Cumulative flush time of primary shards |                          |         0 |    min |
|                       Cumulative flush count of primary shards |                          |         0 |        |
|                Min cumulative flush time across primary shards |                          |         0 |    min |
|             Median cumulative flush time across primary shards |                          |         0 |    min |
|                Max cumulative flush time across primary shards |                          |         0 |    min |
|                                        Total Young Gen GC time |                          |     0.086 |      s |
|                                       Total Young Gen GC count |                          |        20 |        |
|                                          Total Old Gen GC time |                          |         0 |      s |
|                                         Total Old Gen GC count |                          |         0 |        |
|                                                     Store size |                          | 0.0406146 |     GB |
|                                                  Translog size |                          |         0 |     GB |
|                                         Heap used for segments |                          |         0 |     MB |
|                                       Heap used for doc values |                          |         0 |     MB |
|                                            Heap used for terms |                          |         0 |     MB |
|                                            Heap used for norms |                          |         0 |     MB |
|                                           Heap used for points |                          |         0 |     MB |
|                                    Heap used for stored fields |                          |         0 |     MB |
|                                                  Segment count |                          |         8 |        |
|                                                 Min Throughput |           polling-ingest |   1071.82 |  ops/s |
|                                                Mean Throughput |           polling-ingest |   1285.21 |  ops/s |
|                                              Median Throughput |           polling-ingest |   1287.61 |  ops/s |
|                                                 Max Throughput |           polling-ingest |   1316.12 |  ops/s |
|                                        50th percentile latency |           polling-ingest |   75.5824 |     ms |
|                                        90th percentile latency |           polling-ingest |   84.0513 |     ms |
|                                        99th percentile latency |           polling-ingest |   97.4661 |     ms |
|                                      99.9th percentile latency |           polling-ingest |   128.291 |     ms |
|                                       100th percentile latency |           polling-ingest |   130.113 |     ms |
|                                   50th percentile service time |           polling-ingest |   75.5824 |     ms |
|                                   90th percentile service time |           polling-ingest |   84.0513 |     ms |
|                                   99th percentile service time |           polling-ingest |   97.4661 |     ms |
|                                 99.9th percentile service time |           polling-ingest |   128.291 |     ms |
|                                  100th percentile service time |           polling-ingest |   130.113 |     ms |
|                                                     error rate |           polling-ingest |         0 |      % |
|                                                 Min Throughput | wait-until-merges-finish |       130 |  ops/s |
|                                                Mean Throughput | wait-until-merges-finish |       130 |  ops/s |
|                                              Median Throughput | wait-until-merges-finish |       130 |  ops/s |
|                                                 Max Throughput | wait-until-merges-finish |       130 |  ops/s |
|                                       100th percentile latency | wait-until-merges-finish |   7.36037 |     ms |
|                                  100th percentile service time | wait-until-merges-finish |   7.36037 |     ms |
|                                                     error rate | wait-until-merges-finish |         0 |      % |


---------------------------------
[INFO] SUCCESS (took 151 seconds)
---------------------------------       

xuxiong1 added 4 commits March 6, 2025 01:57
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
@xuxiong1 xuxiong1 changed the title [WIP] support benchmark for polling ingestion Support benchmark for polling ingestion Mar 6, 2025
xuxiong1 added 2 commits March 7, 2025 01:53
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
@yupeng9
Copy link

yupeng9 commented Mar 8, 2025

LGTM

Signed-off-by: xuxiong1 <xiongxug@outlook.com>
@xuxiong1
Copy link
Contributor Author

@IanHoang, @gkamat, please take a look when you have time. thanks!

@IanHoang
Copy link
Collaborator

@xuxiong1 Unittests are failing right now due to the following error:

ERROR: Could not find a version that satisfies the requirement aiokafka>=0.12.0 (from opensearch-benchmark) (from versions: 0.0.1, 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.2.0, 0.2.1, 0.2.2, 0.2.3, 0.3.0, 0.3.1, 0.4.0, 0.4.1, 0.4.2.dev0, 0.4.2, 0.4.3, 0.5.0, 0.5.1, 0.5.2, 0.6.0, 0.7.0, 0.7.1, 0.7.2, 0.8.0, 0.8.1, 0.9.0rc0, 0.9.0rc1, 0.9.0, 0.10.0a0, 0.10.0, 0.11.0)
ERROR: No matching distribution found for aiokafka>=0.12.0

Although aiokafka 0.12.0 is available on PyPi, OSB does not recognize it as an available version. Could you verify that aiokafka version 0.12.0 was installed in the environment where you performed the E2E test?

For now, we can quickly use aiokafka>=0.11.0 in setup.py.

@xuxiong1
Copy link
Contributor Author

@xuxiong1 Unittests are failing right now due to the following error:

ERROR: Could not find a version that satisfies the requirement aiokafka>=0.12.0 (from opensearch-benchmark) (from versions: 0.0.1, 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.2.0, 0.2.1, 0.2.2, 0.2.3, 0.3.0, 0.3.1, 0.4.0, 0.4.1, 0.4.2.dev0, 0.4.2, 0.4.3, 0.5.0, 0.5.1, 0.5.2, 0.6.0, 0.7.0, 0.7.1, 0.7.2, 0.8.0, 0.8.1, 0.9.0rc0, 0.9.0rc1, 0.9.0, 0.10.0a0, 0.10.0, 0.11.0)
ERROR: No matching distribution found for aiokafka>=0.12.0

Although aiokafka 0.12.0 is available on PyPi, OSB does not recognize it as an available version. Could you verify that aiokafka version 0.12.0 was installed in the environment where you performed the E2E test?

For now, we can quickly use aiokafka>=0.11.0 in setup.py.

Thanks for pointing out this, that's the version on my local, but sure I'll set it to 0.11.0 and it should work.

Signed-off-by: xuxiong1 <xiongxug@outlook.com>
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
@xuxiong1
Copy link
Contributor Author

local E2E test with telemetry enabled:

 % opensearch-benchmark execute-test \
  --workload-repository=default \
  --workload-revision=51899e7b8b21f23b8050b5ad363c65e96d17d757 \
  --pipeline=benchmark-only \
  --workload=geonames \
  --target-host=127.0.0.1:9200 \
  --workload-params '{
    "ingest_percentage": "5",
    "bulk_size": "1000",
    "number_of_shards": "1",
    "number_of_replicas": "0",
    "ingestion_source_type": "kafka",
    "ingestion_topic": "test",
    "ingestion_bootstrap_servers": "localhost:44469",
    "ingestion_pointer_init_reset": "earliest"
  }' \
  --test-procedure=polling-ingest-index-only \
  --kill-running-processes \
  --telemetry shard-stats \
  --telemetry-params '{
    "shard-stats-sample-interval": 0.1
  }'

test output:

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

[INFO] [Test Execution ID]: 6b328b3e-16df-43cb-a952-674800333268
[INFO] Downloading workload data file: documents-2.json.bz2 (252.9 MB total size) [100.0%]
[INFO] Decompressing workload data from [/home/user/.osb/benchmarks/data/geonames/documents-2.json.bz2] to [/home/user/.osb/benchmarks/data/geonames/documents-2.json] (resulting size: [3.30] GB) ... [OK]
[INFO] Downloading workload data file: documents-2.json.offset (4.2 kB total size)[100.0%]
[INFO] Executing test with workload [geonames], test_procedure [polling-ingest-index-only] and provision_config_instance ['external'] with version [3.0.0-alpha1].

[WARNING] merges_total_time is 43043 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] merges_total_throttled_time is 19640 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] indexing_total_time is 3253 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] refresh_total_time is 15191 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] flush_total_time is 1914 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running polling-ingest                                                         [100% done]
Running refresh-after-index                                                    [100% done]
Running force-merge                                                            [100% done]
Running refresh-after-force-merge                                              [100% done]
Running wait-until-merges-finish                                               [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
            
|                                                         Metric |                     Task |     Value |   Unit |
|---------------------------------------------------------------:|-------------------------:|----------:|-------:|
|                     Cumulative indexing time of primary shards |                          |     0.107 |    min |
|             Min cumulative indexing time across primary shards |                          |         0 |    min |
|          Median cumulative indexing time across primary shards |                          |         0 |    min |
|             Max cumulative indexing time across primary shards |                          |  0.105083 |    min |
|            Cumulative indexing throttle time of primary shards |                          |         0 |    min |
|    Min cumulative indexing throttle time across primary shards |                          |         0 |    min |
| Median cumulative indexing throttle time across primary shards |                          |         0 |    min |
|    Max cumulative indexing throttle time across primary shards |                          |         0 |    min |
|                        Cumulative merge time of primary shards |                          |   0.82095 |    min |
|                       Cumulative merge count of primary shards |                          |        26 |        |
|                Min cumulative merge time across primary shards |                          |         0 |    min |
|             Median cumulative merge time across primary shards |                          |         0 |    min |
|                Max cumulative merge time across primary shards |                          |  0.813483 |    min |
|               Cumulative merge throttle time of primary shards |                          |    0.3993 |    min |
|       Min cumulative merge throttle time across primary shards |                          |         0 |    min |
|    Median cumulative merge throttle time across primary shards |                          |         0 |    min |
|       Max cumulative merge throttle time across primary shards |                          |    0.3993 |    min |
|                      Cumulative refresh time of primary shards |                          |   0.32125 |    min |
|                     Cumulative refresh count of primary shards |                          |       298 |        |
|              Min cumulative refresh time across primary shards |                          |         0 |    min |
|           Median cumulative refresh time across primary shards |                          |         0 |    min |
|              Max cumulative refresh time across primary shards |                          |  0.301117 |    min |
|                        Cumulative flush time of primary shards |                          | 0.0204167 |    min |
|                       Cumulative flush count of primary shards |                          |        14 |        |
|                Min cumulative flush time across primary shards |                          |         0 |    min |
|             Median cumulative flush time across primary shards |                          |         0 |    min |
|                Max cumulative flush time across primary shards |                          | 0.0164667 |    min |
|                                        Total Young Gen GC time |                          |     0.861 |      s |
|                                       Total Young Gen GC count |                          |       240 |        |
|                                          Total Old Gen GC time |                          |         0 |      s |
|                                         Total Old Gen GC count |                          |         0 |        |
|                                                     Store size |                          |  0.966518 |     GB |
|                                                  Translog size |                          | 0.0864068 |     GB |
|                                         Heap used for segments |                          |         0 |     MB |
|                                       Heap used for doc values |                          |         0 |     MB |
|                                            Heap used for terms |                          |         0 |     MB |
|                                            Heap used for norms |                          |         0 |     MB |
|                                           Heap used for points |                          |         0 |     MB |
|                                    Heap used for stored fields |                          |         0 |     MB |
|                                                  Segment count |                          |        53 |        |
|                                                 Min Throughput |           polling-ingest |   1121.74 |  ops/s |
|                                                Mean Throughput |           polling-ingest |    1349.2 |  ops/s |
|                                              Median Throughput |           polling-ingest |    1355.8 |  ops/s |
|                                                 Max Throughput |           polling-ingest |   1366.79 |  ops/s |
|                                        50th percentile latency |           polling-ingest |   730.485 |     ms |
|                                        90th percentile latency |           polling-ingest |   784.434 |     ms |
|                                        99th percentile latency |           polling-ingest |   882.368 |     ms |
|                                       100th percentile latency |           polling-ingest |   1007.32 |     ms |
|                                   50th percentile service time |           polling-ingest |   730.485 |     ms |
|                                   90th percentile service time |           polling-ingest |   784.434 |     ms |
|                                   99th percentile service time |           polling-ingest |   882.368 |     ms |
|                                  100th percentile service time |           polling-ingest |   1007.32 |     ms |
|                                                     error rate |           polling-ingest |         0 |      % |
|                                                 Min Throughput | wait-until-merges-finish |     157.7 |  ops/s |
|                                                Mean Throughput | wait-until-merges-finish |     157.7 |  ops/s |
|                                              Median Throughput | wait-until-merges-finish |     157.7 |  ops/s |
|                                                 Max Throughput | wait-until-merges-finish |     157.7 |  ops/s |
|                                       100th percentile latency | wait-until-merges-finish |   5.99734 |     ms |
|                                  100th percentile service time | wait-until-merges-finish |   5.99734 |     ms |
|                                                     error rate | wait-until-merges-finish |         0 |      % |


---------------------------------
[INFO] SUCCESS (took 476 seconds)
---------------------------------

able to visualize the metrics on opensearch dashboard:
Screenshot 2025-03-22 at 8 06 47 PM

Hi @rishabh6788 @IanHoang @yupeng9 updated the PR with telemetry enabled, please take a look.

@rishabh6788
Copy link
Collaborator

LGTM, please fix the lint errors.

Signed-off-by: xuxiong1 <xiongxug@outlook.com>
@xuxiong1
Copy link
Contributor Author

Hi @rishabh6788 , I updated the PR and resolved the lint errors, please take a look.
Also I moved RequestContextHolder to a new context.py file to resolve the cyclic dependency issue, as it's imported by both osbenchmark.client and osbenchmark.kafka_client.

@rishabh6788
Copy link
Collaborator

@IanHoang @gkamat I'm good, feel free to merge if no concerns.

@xuxiong1
Copy link
Contributor Author

@IanHoang @gkamat gentle ping here in case you missed.

@rishabh6788 rishabh6788 merged commit e741888 into opensearch-project:main Apr 2, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants