Skip to content

[BUG] Semantic Search template is not working on OpenSearch Dashboards with pre-trained local model #795

@William-Yao-Netapp

Description

@William-Yao-Netapp

What is the bug?
When I create an ingest pipeline using the Semantic Search template in AI Search Flows (OpenSearch Dashboards), the ML Inference processor fails, and no data is ingested. The error indicates a failure to run the TEXT_EMBEDDING model.

Data not ingested. Errors found with the following ingest processor(s):
Processor type: Ml_inference
Error: m_l_exception: Failed to inference TEXT_EMBEDDING model: ZkBux5kBrJZUq2xUFi3j

Using the same model and template via backend workflow API successful provision all the resources (KNN index, ingest pipeline)

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Deploy any pre-trained model via Model API (e.g. huggingface/sentence-transformers/all-MiniLM-L6-v2)
  2. Go to OpenSearch Dashboards > AI Search Flows > Create workflow > Semantic Search
  3. load some sample data
{"text":"A West Virginia university women 's basketball team , officials , and a small gathering of fans are in a West Virginia arena .","id":"4319130149.jpg"}
{"text":"A wild animal races across an uncut field with a minimal amount of trees .","id":"1775029934.jpg"}
{"text":"People line the stands which advertise Freemont 's orthopedics , a cowboy rides a light brown bucking bronco .","id":"2664027527.jpg"}
{"text":"A man who is riding a wild horse in the rodeo is very near to falling off .","id":"4427058951.jpg"}
{"text":"A rodeo cowboy , wearing a cowboy hat , is being thrown off of a wild white horse .","id":"2691147709.jpg"}
  1. update output dimension in knn index according to the model dimension
  2. Update Model input and Value (in my sample data is text) in ML Inference Processor
  3. Click create ingest pipeline
  4. See error
Processor type: Ml_inference
Error: m_l_exception: Failed to inference TEXT_EMBEDDING model: ZkBux5kBrJZUq2xUFi3j

What is your host/environment?

  • OS: [Debian 12]
  • Version [OpenSearch/OpenSearch Dashboards 3.2.0]
  • Plugins enabled: opensearch-ml, opensearch-knn, neural-search, and flow-framework

Do you have any screenshots?

Image

Do you have any additional context?
Post the opensearch log if it is helpful

[2025-10-09T06:04:48,682][ERROR][o.o.m.e.a.DLModel        ] [ip-10-0-62-58] Failed to inference TEXT_EMBEDDING model: ZkBux5kBrJZUq2xUFi3j
java.lang.ClassCastException: class org.opensearch.ml.common.dataset.remote.RemoteInferenceInputDataSet cannot be cast to class org.opensearch.ml.common.dataset.TextDocsInputDataSet (org.opensearch.ml.common.dataset.remote.RemoteInferenceInputDataSet and org.opensearch.ml.common.dataset.TextDocsInputDataSet are in unnamed module of loader java.net.FactoryURLClassLoader @3841e77a)
        at org.opensearch.ml.engine.algorithms.TextEmbeddingModel.predict(TextEmbeddingModel.java:39) ~[opensearch-ml-algorithms-3.2.0.0.jar:?]
        at org.opensearch.ml.engine.algorithms.DLModel.lambda$predict$0(DLModel.java:89) ~[opensearch-ml-algorithms-3.2.0.0.jar:?]
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:571) ~[?:?]
        at org.opensearch.ml.engine.algorithms.DLModel.predict(DLModel.java:84) ~[opensearch-ml-algorithms-3.2.0.0.jar:?]
        at org.opensearch.ml.task.MLPredictTaskRunner.lambda$runPredict$4(MLPredictTaskRunner.java:513) ~[opensearch-ml-3.2.0.0.jar:3.2.0.0]
        at org.opensearch.ml.model.MLModelManager.trackPredictDuration(MLModelManager.java:2521) ~[opensearch-ml-3.2.0.0.jar:3.2.0.0]
        at org.opensearch.ml.task.MLPredictTaskRunner.runPredict(MLPredictTaskRunner.java:513) ~[opensearch-ml-3.2.0.0.jar:3.2.0.0]
        at org.opensearch.ml.task.MLPredictTaskRunner.predict(MLPredictTaskRunner.java:384) ~[opensearch-ml-3.2.0.0.jar:3.2.0.0]
        at org.opensearch.ml.task.MLPredictTaskRunner.lambda$executePredictionByInputDataType$2(MLPredictTaskRunner.java:328) ~[opensearch-ml-3.2.0.0.jar:3.2.0.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:916) ~[opensearch-3.2.0.jar:3.2.0]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
[2025-10-09T06:04:48,682][ERROR][o.o.m.t.MLPredictTaskRunner] [ip-10-0-62-58] Failed to predict model ZkBux5kBrJZUq2xUFi3j
org.opensearch.ml.common.exception.MLException: Failed to inference TEXT_EMBEDDING model: ZkBux5kBrJZUq2xUFi3j
        at org.opensearch.ml.engine.algorithms.DLModel.predict(DLModel.java:94) ~[opensearch-ml-algorithms-3.2.0.0.jar:?]
        at org.opensearch.ml.task.MLPredictTaskRunner.lambda$runPredict$4(MLPredictTaskRunner.java:513) ~[opensearch-ml-3.2.0.0.jar:3.2.0.0]
        at org.opensearch.ml.model.MLModelManager.trackPredictDuration(MLModelManager.java:2521) ~[opensearch-ml-3.2.0.0.jar:3.2.0.0]
        at org.opensearch.ml.task.MLPredictTaskRunner.runPredict(MLPredictTaskRunner.java:513) ~[opensearch-ml-3.2.0.0.jar:3.2.0.0]
        at org.opensearch.ml.task.MLPredictTaskRunner.predict(MLPredictTaskRunner.java:384) ~[opensearch-ml-3.2.0.0.jar:3.2.0.0]
        at org.opensearch.ml.task.MLPredictTaskRunner.lambda$executePredictionByInputDataType$2(MLPredictTaskRunner.java:328) ~[opensearch-ml-3.2.0.0.jar:3.2.0.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:916) ~[opensearch-3.2.0.jar:3.2.0]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: java.lang.ClassCastException: class org.opensearch.ml.common.dataset.remote.RemoteInferenceInputDataSet cannot be cast to class org.opensearch.ml.common.dataset.TextDocsInputDataSet (org.opensearch.ml.common.dataset.remote.RemoteInferenceInputDataSet and org.opensearch.ml.common.dataset.TextDocsInputDataSet are in unnamed module of loader java.net.FactoryURLClassLoader @3841e77a)
        at org.opensearch.ml.engine.algorithms.TextEmbeddingModel.predict(TextEmbeddingModel.java:39) ~[opensearch-ml-algorithms-3.2.0.0.jar:?]
        at org.opensearch.ml.engine.algorithms.DLModel.lambda$predict$0(DLModel.java:89) ~[opensearch-ml-algorithms-3.2.0.0.jar:?]
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:571) ~[?:?]
        at org.opensearch.ml.engine.algorithms.DLModel.predict(DLModel.java:84) ~[opensearch-ml-algorithms-3.2.0.0.jar:?]
        ... 9 more

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions