add ml_inference processor for offline batch inference #5507

Zhangxunmt · 2025-03-06T20:19:23Z

Description

Adding a new ml_inference processor to interact with ml-commons plugin in OpenSearch for ML related applications.

Some examples that work well:

ml-batch-job-pipeline:
  source:
    s3:
      codec:
        ndjson:
      compression: none
      aws:
        region: "us-east-1"
      default_bucket_owner: <your account>
      scan:
        scheduling:
          interval: PT2M
        buckets:
          - bucket:
              name: "offlinebatch"
              data_selection: metadata_only
              filter:
                include_prefix:
                  - bedrock-multisource/my_batch
                exclude_suffix:
                  - .out
          - bucket:
              name: "offlinebatch"
              data_selection: data_only
              filter:
                include_prefix:
                  - bedrock-multisource/output-multisource/
                exclude_suffix:
                  - manifest.json.out

  buffer:
    bounded_blocking:
      buffer_size: 2048 # max number of records the buffer accepts
      batch_size: 512 # max number of records the buffer drains after each read

  processor:
    - ml_inference:
        host: "<your host>"
        aws_sigv4: true
        action_type: "batch_predict"
        service_name: "bedrock"
        model_id: "<your model id in search>"
        output_path: "s3://offlinebatch/bedrock-multisource/output-multisource/"
        aws:
          region: "us-east-1"
        ml_when: /bucket == "offlinebatch"
    - copy_values:
        entries:
          - to_key: chapter
            from_key: /modelInput/inputText
          - to_key: chapter_embedding
            from_key: /modelOutput/embedding
    - delete_entries:
        with_keys: [modelInput, modelOutput, recordId, s3]

  route:
      - ml-ingest-route: "/chapter != null and /chapter_embedding != null"

  sink:
    - opensearch:
        hosts: ["<your host>"]
        aws_sigv4: true
        index: "my-nlp-index-bedrock"
        routes: [ml-ingest-route]

Issues Resolved

#5470
#5433
#5509

Check List

New functionality includes testing.
New functionality has a documentation issue. Please link to it in this PR.
- New functionality has javadoc added
Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

data-prepper-plugins/ml-processor/build.gradle

.../ml-processor/src/main/java/org/opensearch/dataprepper/plugins/ml/processor/MLProcessor.java

settings.gradle

...ns/s3-source/src/main/java/org/opensearch/dataprepper/plugins/source/s3/S3ObjectHandler.java

.../ml-processor/src/main/java/org/opensearch/dataprepper/plugins/ml/processor/MLProcessor.java

...in/java/org/opensearch/dataprepper/plugins/ml/processor/common/SageMakerBatchJobCreator.java

dlvenable · 2025-03-12T21:46:38Z

@Zhangxunmt , Thank you for this great processor!

We will also need some unit tests. I'm ok accepting this PR without them as long as we have the @Experimental annotation.

Zhangxunmt · 2025-03-12T23:13:03Z

@Zhangxunmt , Thank you for this great processor!

We will also need some unit tests. I'm ok accepting this PR without them as long as we have the @Experimental annotation.

dlvenable Thanks David for the comments. Looks like there're no major concerns. I will add the remaining UTs soon and the @experimental annotation.

...-processor/src/main/java/org/opensearch/dataprepper/plugins/ml/processor/util/RetryUtil.java

...nce-processor/src/main/java/org/opensearch/dataprepper/plugins/ml/processor/MLProcessor.java

...ocessor/src/main/java/org/opensearch/dataprepper/plugins/ml/processor/MLProcessorConfig.java

...nce-processor/src/main/java/org/opensearch/dataprepper/plugins/ml/processor/MLProcessor.java

data-prepper-plugins/ml-inference-processor/build.gradle

.../ml-processor/src/main/java/org/opensearch/dataprepper/plugins/ml/processor/MLProcessor.java

...nce-processor/src/main/java/org/opensearch/dataprepper/plugins/ml/processor/MLProcessor.java

...in/java/org/opensearch/dataprepper/plugins/ml/processor/common/SageMakerBatchJobCreator.java

Zhangxunmt · 2025-03-31T21:51:46Z

@dlvenable Please review the latest commit for the updates to requested changes, after a rebase to the main. The Gradle Builds somehow fail due to unrelated tests.

...-prepper-plugins/common/src/main/java/org/opensearch/dataprepper/common/utils/RetryUtil.java

kkondaka · 2025-04-02T20:00:20Z

...-prepper-plugins/common/src/main/java/org/opensearch/dataprepper/common/utils/RetryUtil.java

+                return true; // Success
+            } catch (Exception e) {
+                try {
+                    Thread.sleep(BASE_DELAY_MS * (1L << attempt)); // Exponential backoff


This will increase quite fast with number of retries. We should use import com.linecorp.armeria.client.retry.Backoff; and the code should be similar to exponential backoff code in https://github.com/kkondaka/kk-data-prepper-f2/blob/main/data-prepper-plugins/opensearch/src/main/java/org/opensearch/dataprepper/plugins/sink/opensearch/BulkRetryStrategy.java

updated to use similar backoff logic with linecorp.armeria.client

Signed-off-by: Xun Zhang <[email protected]>

…uation Signed-off-by: Xun Zhang <[email protected]>

…put_key is null Signed-off-by: Xun Zhang <[email protected]>

Signed-off-by: Xun Zhang <[email protected]>

dlvenable

Thank you @Zhangxunmt for this contribution!

...n/java/org/opensearch/dataprepper/plugins/ml_inference/processor/client/S3ClientFactory.java

…oject#5507) Add ml processor for offline batch inference Signed-off-by: Xun Zhang <[email protected]>

…oject#5507) Add ml processor for offline batch inference Signed-off-by: Xun Zhang <[email protected]> Signed-off-by: mamol27 <[email protected]>

Zhangxunmt requested review from sb2k16, chenqi0805, engechas, san81, srikanthjg, graytaylor0, dinujoh, kkondaka, KarstenSchnitter, dlvenable and oeyh as code owners March 6, 2025 20:19

Zhangxunmt force-pushed the main branch 3 times, most recently from 86e4f34 to a59fb6a Compare March 10, 2025 22:43

dlvenable requested changes Mar 12, 2025

View reviewed changes

Zhangxunmt force-pushed the main branch 5 times, most recently from 0b384c9 to 9ce8a77 Compare March 24, 2025 22:26

Zhangxunmt force-pushed the main branch 3 times, most recently from 5ae7861 to c670ca5 Compare March 25, 2025 18:28

Zhangxunmt changed the title ~~add ml processor for offline batch inference~~ add ml-inference processor for offline batch inference Mar 25, 2025

Zhangxunmt changed the title ~~add ml-inference processor for offline batch inference~~ add ml_inference processor for offline batch inference Mar 25, 2025

Zhangxunmt force-pushed the main branch 3 times, most recently from d0aa269 to 0108602 Compare March 25, 2025 20:00

graytaylor0 reviewed Mar 25, 2025

View reviewed changes

Zhangxunmt force-pushed the main branch from c14524e to 2e96545 Compare March 25, 2025 22:47

srikanthjg reviewed Mar 27, 2025

View reviewed changes

...nce-processor/src/main/java/org/opensearch/dataprepper/plugins/ml/processor/MLProcessor.java Outdated Show resolved Hide resolved

dlvenable requested changes Mar 27, 2025

View reviewed changes

Zhangxunmt force-pushed the main branch from 2e96545 to bf4a4c6 Compare March 31, 2025 21:08

kkondaka requested changes Apr 2, 2025

View reviewed changes

Zhangxunmt added 7 commits April 2, 2025 22:43

add ml processor for offline batch inference

ad3ca6e

Signed-off-by: Xun Zhang <[email protected]>

add more UTs and address comments

23f6f33

Signed-off-by: Xun Zhang <[email protected]>

rename to ml-inference

1e29b6c

Signed-off-by: Xun Zhang <[email protected]>

update readme

d213fc9

Signed-off-by: Xun Zhang <[email protected]>

move retryUtils into common lib and add try catch for conditions eval…

c20a5a5

…uation Signed-off-by: Xun Zhang <[email protected]>

add failure tag, catch known error types, and use default key when in…

d0fe354

…put_key is null Signed-off-by: Xun Zhang <[email protected]>

address more comments

798640c

Signed-off-by: Xun Zhang <[email protected]>

Zhangxunmt force-pushed the main branch from bf4a4c6 to 798640c Compare April 3, 2025 05:44

update README

2ee494d

Signed-off-by: Xun Zhang <[email protected]>

dlvenable approved these changes Apr 3, 2025

View reviewed changes

kkondaka reviewed Apr 3, 2025

View reviewed changes

...n/java/org/opensearch/dataprepper/plugins/ml_inference/processor/client/S3ClientFactory.java Show resolved Hide resolved

kkondaka approved these changes Apr 3, 2025

View reviewed changes

dlvenable merged commit f451272 into opensearch-project:main Apr 3, 2025
70 of 74 checks passed

amdhing pushed a commit to amdhing/data-prepper that referenced this pull request Apr 16, 2025

add ml_inference processor for offline batch inference (opensearch-pr…

088476a

…oject#5507) Add ml processor for offline batch inference Signed-off-by: Xun Zhang <[email protected]>

Davidding4718 pushed a commit to Davidding4718/data-prepper that referenced this pull request Apr 25, 2025

add ml_inference processor for offline batch inference (opensearch-pr…

777f1fe

…oject#5507) Add ml processor for offline batch inference Signed-off-by: Xun Zhang <[email protected]>

Davidding4718 pushed a commit to Davidding4718/data-prepper that referenced this pull request Apr 25, 2025

add ml_inference processor for offline batch inference (opensearch-pr…

65e0064

…oject#5507) Add ml processor for offline batch inference Signed-off-by: Xun Zhang <[email protected]>

add ml_inference processor for offline batch inference #5507

add ml_inference processor for offline batch inference #5507

Uh oh!

Conversation

Zhangxunmt commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues Resolved

Check List

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dlvenable commented Mar 12, 2025

Uh oh!

Zhangxunmt commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Zhangxunmt commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kkondaka Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

Zhangxunmt Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

dlvenable left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Zhangxunmt commented Mar 6, 2025 •

edited

Loading

Zhangxunmt commented Mar 12, 2025 •

edited

Loading

Zhangxunmt commented Mar 31, 2025 •

edited

Loading