[Inference API] Add Custom Model support to Inference API #124299

Huaixinww · 2025-03-07T09:54:26Z

Add Custom Model support to Inference API.

You can use this Inference API to invoke models that support the HTTP format.

Inference Model Creation:

PUT _inference/{task_type}/{inference_id}
{
  "service": "custom-model",
  "service_settings": {
    "secret_parameters": {
      ...
    },
    "url": "<<url>>",
    "path": {
      "<<path>>": {
        "<<method>>": {
          "query_string": "<<query_string_parameters>>",
          "headers": {
            <<header parameters>>
          },
          "request": {
            "format": "string",
            "content": "<<content>>"
          },
          "response": {
            "json_parser":{
              ...
            }
          }
        }
      }
    }
  },
  "task_settings": {
    "parameters":{
      ...
    }
  }
}

Support task_type

text_embedding
sparse_embedding
rerank
completion
custom (For other types that are currently not supported by Elasticsearch, use this type uniformly)

Parameter Description

secret_parameters: secret parameters like api_key can be defined here.

"secret_parameters":{
  "api_key":"xxx"
}

query_string(optional): http's query parameters

"query_string": "?key=value"

headers(optional):https' header parameters

"headers":{
  "Authorization": "Bearer ${api_key}",    //Replace the placeholders when constructing the request.
  "Content-Type": "application/json;charset=utf-8"
}

request.format: only support string now
request.content: The body structure of the request requires passing in the string-escaped result of the JSON format HTTP request body.

"request":{
  "format":"string",
  "content":"{\"input\":${input}}"
}

# use kibana
"request":{
  "format":"string",
  "content":"""
    {
      "input":${input}   //Replace the placeholders when constructing the request.
    }
    """
}

response.json_parser: We need to parse the returned response into an object that Elasticsearch can recognize.(TextEmbeddingFloatResults, SparseEmbeddingResults, RankedDocsResults, ChatCompletionResults)
Therefore, we use jsonPath syntax to parse the necessary content from the response.
(For the text_embedding type, we need a List<List<Float>> object. The same applies to other types.)
Different task types have different json_parser parameters.

# text_embedding
"response":{
  "json_parser":{
    "text_embeddings":"$.result.embeddings[*].embedding"
  }
}

# sparse_embedding
"response":{
  "json_parser":{
    "sparse_result":{
      "path":"$.result.sparse_embeddings[*]",
      "value":{
        "sparse_token":"$.embedding[*].token_id",
        "sparse_weight":"$.embedding[*].weight"   
      }
    }
  }
}

# rerank
"response":{
  "json_parser":{
    "reranked_index":"$.result.scores[*].index",    // optional
    "relevance_score":"$.result.scores[*].score",
    "document_text":"xxx"    // optional
  }
}

# completion
"response":{
  "json_parser":{
    "completion_result":"$.result.text"
  }
}

task_settings.parameters: Due to the limitations of the inference framework, if the model requires more parameters to be configured, they can be set in task_settings.parameters. These parameters can be placed in the request.body as placeholders and replaced with the configured values when constructing the request.

"task_settings":{
  "parameters":{
    "input_type":"query",
    "return_token":true
  }
}

Testing

we use Alibaba Cloud AI Search Model for example,
Please replace the value of secret_parameters.api_key with your api_key.

text_embedding

PUT _inference/text_embedding/custom_embeddings
{
  "service":"custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":"<<your api_key>>"
    },
    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
    "path":{
      "/v3/openapi/workspaces/default/text-embedding/ops-text-embedding-001":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json;charset=utf-8"
          },
          "request":{
            "format":"string",
            "content":"""
              {
                "input":${input}
              }
              """
          },
          "response":{
            "json_parser":{
              "text_embeddings":"$.result.embeddings[*].embedding"
            }
          }
        }
      }
    }
  }
}

POST _inference/text_embedding/custom_embeddings
{
  "input":"test"
}

sparse_embedding

PUT _inference/sparse_embedding/custom_sparse_embedding
{
  "service":"custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":<<your api_key>>
    },
    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
    "path":{
      "/v3/openapi/workspaces/default/text-sparse-embedding/ops-text-sparse-embedding-001":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json;charset=utf-8"
          },
          "request":{
            "format":"string",
            "content":"""
              {
                "input": ${input},
                "input_type": "${input_type}",
                "return_token": ${return_token}
              }
              """
          },
          "response":{
            "json_parser":{
              "sparse_result":{
                "path":"$.result.sparse_embeddings[*]",
                "value":{
                  "sparse_token":"$.embedding[*].token_id",
                  "sparse_weight":"$.embedding[*].weight"   
                }
              }
            }
          }
        }
      }
    }
  },
  "task_settings":{
    "parameters":{
      "input_type":"query",
      "return_token":true
    }
  }
}

POST _inference/sparse_embedding/custom_sparse_embedding?error_trace
{
  "input":["hello", "world"]
}

rerank

PUT _inference/rerank/custom_rerank
{
  "service":"custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":<<your api_key>>
    },
    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
    "path":{
      "/v3/openapi/workspaces/default/ranker/ops-bge-reranker-larger":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json;charset=utf-8"
          },
          "request":{
            "format":"string",
            "content":"""
              {
                "query": "${query}",
                "docs": ${input}
              }
              """
          },
          "response":{
            "json_parser":{
              "reranked_index":"$.result.scores[*].index",
              "relevance_score":"$.result.scores[*].score"
            }
          }
        }
      }
    }
  }
}

POST _inference/rerank/custom_rerank
{
  "input": ["luke", "like", "leia", "chewy","r2d2", "star", "wars"],
  "query": "star wars main character"
}

completion

In the completion module, we demonstrated how to use the task_settings.parameters parameter for more flexible parameter configuration.
To understand completion interface definition for the Alibaba Cloud AI Search completion API, please refer to the official documentation alibaba cloud ai search completion api doc

PUT _inference/completion/custom_completion
{
  "service":"custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":<<your api_key>>
    },
    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
    "path":{
      "/v3/openapi/workspaces/default/text-generation/deepseek-r1":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}"
          },
          "request":{
            "format":"string",
            "content":"{\"messages\":${messages}}"
          },
          "response":{
            "json_parser":{
              "completion_result":"$.result.text"
            }
          }
        }
      }
    }
  }
}

POST _inference/completion/custom_completion
{
  "input":"",
  "task_settings":{
    "parameters":{
      "messages":[
        {
          "role":"system", 
          "content":"你是一个机器人助手"
        },
        {
          "role":"user", 
          "content":"河南的省会是哪里"
        },
        {
          "role":"assistant", 
          "content":"郑州"
        },
        {
          "role":"user", 
          "content":"那里有什么好玩的"
        }
      ]
    }
  }
}

custom

we use query-analyze for example

PUT _inference/custom/query_analyze
{
  "service":"custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":<<your api_key>>
    },
    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
    "path":{
      "/v3/openapi/workspaces/default/query-analyze/ops-query-analyze-001":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}"
          },
          "request":{
            "format":"string",
            "content":"""
              {
                "query":"${query}"
              }
              """
          }
        }
      }
    }
  }
}

POST _inference/custom/query_analyze
{
  "input": "",    // Due to input validation, input cannot be null.
  "query": "what is elasticsearch?"
}

elasticsearchmachine · 2025-03-07T10:04:12Z

Pinging @elastic/ml-core (Team:ML)

davidkyle · 2025-03-11T09:59:07Z

Thank you @Huaixinww I love the idea of custom models and using JsonPath to configure the response parsing. I'd like to start the review by building a few custom models of my own.

weizijun · 2025-03-12T07:36:22Z

Thank you @Huaixinww I love the idea of custom models and using JsonPath to configure the response parsing. I'd like to start the review by building a few custom models of my own.

@davidkyle This feature is very useful for our AlibabaCloud user. We will continue improve the feature. And we can discuss how to import this feature to the elasticsearch community.

davidkyle · 2025-03-27T09:30:33Z

@Huaixinww and @weizijun the ml team at Elasticsearch love this idea and consider it to be a core feature for the Inference API. Going forward we would like to take on this work in a new PR. I've opted to create a new PR because CI does not run automatically against PRs from external contributors and that will slow the development process.

I've opened #125679 which contains all the commits from this PR plus some fixes for the build system. We will review the PR and add any missing tests etc.

With your permission we would like to make some minor changes:

Rename the service from custom-model to custom
Merge the url and path fields into a single field

    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
    "path":{
      "/v3/openapi/workspaces/default/text-generation/deepseek-r1":{

Would become

    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com/v3/openapi/workspaces/default/text-generation/deepseek-r1"

Add a method field for the HTTP method

With these changes the embedding example from the PR description would look like this:

PUT _inference/text_embedding/custom_embeddings
{
  "service":"custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":"<<your api_key>>"
    },
    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com/v3/openapi/workspaces/default/text-embedding/ops-text-embedding-001",
    "method": "POST",
    "headers":{
      "Authorization": "Bearer ${api_key}",
      "Content-Type": "application/json;charset=utf-8"
    },
    "request":{
      "format":"string",
      "content":"""
        {
          "input":${input}
        }
        """
    },
    "response":{
      "json_parser":{
        "text_embeddings":"$.result.embeddings[*].embedding"
      }
    }
  }
}

weizijun · 2025-03-28T06:18:46Z

With your permission we would like to make some minor changes:

@davidkyle Our design patten is from openapi. I think your three changes are okay. I think both methods are ok, no problem.

jonathan-buttner · 2025-05-08T20:21:28Z

I messed up Dave's PR so opened a new one here: #127939

Huaixinww added 8 commits March 7, 2025 14:08

add inference custom model

596d0d8

add unit test

3402acb

spotless apply

c59bc0b

Merge branch 'main' into feature/add-inference-custom-model

934ae96

add custom validation

2bb51d4

xpack core spotless apply

e26f55c

Merge branch 'main' into feature/add-inference-custom-model

20304bf

update commons-lang3's version

12e465e

Huaixinww requested a review from a team as a code owner March 7, 2025 09:54

elasticsearchmachine added needs:triage v9.1.0 labels Mar 7, 2025

Merge branch 'main' into feature/add-inference-custom-model

6012c42

elasticsearchmachine added the external-contributor label Mar 7, 2025

davidkyle added :ml and removed needs:triage labels Mar 7, 2025

elasticsearchmachine added the Team:ML label Mar 7, 2025

Huaixinww changed the title ~~[WIP][Inference API] Add Custom Model support to Inference API~~ [Inference API] Add Custom Model support to Inference API Mar 10, 2025

davidkyle self-assigned this Mar 11, 2025

davidkyle requested review from davidkyle and removed request for a team March 11, 2025 09:52

davidkyle mentioned this pull request Mar 26, 2025

[ML] Custom Inference Service #125679

Closed

dimkots assigned jonathan-buttner and unassigned davidkyle Apr 16, 2025

jonathan-buttner mentioned this pull request May 8, 2025

Adding configurable inference service #127939

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inference API] Add Custom Model support to Inference API #124299

[Inference API] Add Custom Model support to Inference API #124299

Huaixinww commented Mar 7, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Mar 7, 2025

Uh oh!

davidkyle commented Mar 11, 2025

Uh oh!

weizijun commented Mar 12, 2025

Uh oh!

davidkyle commented Mar 27, 2025

Uh oh!

weizijun commented Mar 28, 2025

Uh oh!

jonathan-buttner commented May 8, 2025

Uh oh!

[Inference API] Add Custom Model support to Inference API #124299

Are you sure you want to change the base?

[Inference API] Add Custom Model support to Inference API #124299

Conversation

Huaixinww commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Inference Model Creation:

Support task_type

Parameter Description

Testing

text_embedding

sparse_embedding

rerank

completion

custom

Uh oh!

elasticsearchmachine commented Mar 7, 2025

Uh oh!

davidkyle commented Mar 11, 2025

Uh oh!

weizijun commented Mar 12, 2025

Uh oh!

davidkyle commented Mar 27, 2025

Uh oh!

weizijun commented Mar 28, 2025

Uh oh!

jonathan-buttner commented May 8, 2025

Uh oh!

Huaixinww commented Mar 7, 2025 •

edited

Loading