Skip to content

[Inference API] Add Custom Model support to Inference API #124299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Huaixinww
Copy link
Contributor

@Huaixinww Huaixinww commented Mar 7, 2025

Add Custom Model support to Inference API.

You can use this Inference API to invoke models that support the HTTP format.

Inference Model Creation:

PUT _inference/{task_type}/{inference_id}
{
  "service": "custom-model",
  "service_settings": {
    "secret_parameters": {
      ...
    },
    "url": "<<url>>",
    "path": {
      "<<path>>": {
        "<<method>>": {
          "query_string": "<<query_string_parameters>>",
          "headers": {
            <<header parameters>>
          },
          "request": {
            "format": "string",
            "content": "<<content>>"
          },
          "response": {
            "json_parser":{
              ...
            }
          }
        }
      }
    }
  },
  "task_settings": {
    "parameters":{
      ...
    }
  }
}

Support task_type

  • text_embedding
  • sparse_embedding
  • rerank
  • completion
  • custom (For other types that are currently not supported by Elasticsearch, use this type uniformly)

Parameter Description

  • secret_parameters: secret parameters like api_key can be defined here.
"secret_parameters":{
  "api_key":"xxx"
}
  • query_string(optional): http's query parameters
"query_string": "?key=value"
  • headers(optional):https' header parameters
"headers":{
  "Authorization": "Bearer ${api_key}",    //Replace the placeholders when constructing the request.
  "Content-Type": "application/json;charset=utf-8"
}
  • request.format: only support string now
  • request.content: The body structure of the request requires passing in the string-escaped result of the JSON format HTTP request body.
"request":{
  "format":"string",
  "content":"{\"input\":${input}}"
}

# use kibana
"request":{
  "format":"string",
  "content":"""
    {
      "input":${input}   //Replace the placeholders when constructing the request.
    }
    """
}
  • response.json_parser: We need to parse the returned response into an object that Elasticsearch can recognize.(TextEmbeddingFloatResults, SparseEmbeddingResults, RankedDocsResults, ChatCompletionResults)
    Therefore, we use jsonPath syntax to parse the necessary content from the response.
    (For the text_embedding type, we need a List<List<Float>> object. The same applies to other types.)
    Different task types have different json_parser parameters.
# text_embedding
"response":{
  "json_parser":{
    "text_embeddings":"$.result.embeddings[*].embedding"
  }
}

# sparse_embedding
"response":{
  "json_parser":{
    "sparse_result":{
      "path":"$.result.sparse_embeddings[*]",
      "value":{
        "sparse_token":"$.embedding[*].token_id",
        "sparse_weight":"$.embedding[*].weight"   
      }
    }
  }
}

# rerank
"response":{
  "json_parser":{
    "reranked_index":"$.result.scores[*].index",    // optional
    "relevance_score":"$.result.scores[*].score",
    "document_text":"xxx"    // optional
  }
}

# completion
"response":{
  "json_parser":{
    "completion_result":"$.result.text"
  }
}
  • task_settings.parameters: Due to the limitations of the inference framework, if the model requires more parameters to be configured, they can be set in task_settings.parameters. These parameters can be placed in the request.body as placeholders and replaced with the configured values when constructing the request.
"task_settings":{
  "parameters":{
    "input_type":"query",
    "return_token":true
  }
}

Testing

we use Alibaba Cloud AI Search Model for example,
Please replace the value of secret_parameters.api_key with your api_key.

text_embedding

PUT _inference/text_embedding/custom_embeddings
{
  "service":"custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":"<<your api_key>>"
    },
    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
    "path":{
      "/v3/openapi/workspaces/default/text-embedding/ops-text-embedding-001":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json;charset=utf-8"
          },
          "request":{
            "format":"string",
            "content":"""
              {
                "input":${input}
              }
              """
          },
          "response":{
            "json_parser":{
              "text_embeddings":"$.result.embeddings[*].embedding"
            }
          }
        }
      }
    }
  }
}

POST _inference/text_embedding/custom_embeddings
{
  "input":"test"
}

sparse_embedding

PUT _inference/sparse_embedding/custom_sparse_embedding
{
  "service":"custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":<<your api_key>>
    },
    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
    "path":{
      "/v3/openapi/workspaces/default/text-sparse-embedding/ops-text-sparse-embedding-001":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json;charset=utf-8"
          },
          "request":{
            "format":"string",
            "content":"""
              {
                "input": ${input},
                "input_type": "${input_type}",
                "return_token": ${return_token}
              }
              """
          },
          "response":{
            "json_parser":{
              "sparse_result":{
                "path":"$.result.sparse_embeddings[*]",
                "value":{
                  "sparse_token":"$.embedding[*].token_id",
                  "sparse_weight":"$.embedding[*].weight"   
                }
              }
            }
          }
        }
      }
    }
  },
  "task_settings":{
    "parameters":{
      "input_type":"query",
      "return_token":true
    }
  }
}

POST _inference/sparse_embedding/custom_sparse_embedding?error_trace
{
  "input":["hello", "world"]
}

rerank

PUT _inference/rerank/custom_rerank
{
  "service":"custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":<<your api_key>>
    },
    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
    "path":{
      "/v3/openapi/workspaces/default/ranker/ops-bge-reranker-larger":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json;charset=utf-8"
          },
          "request":{
            "format":"string",
            "content":"""
              {
                "query": "${query}",
                "docs": ${input}
              }
              """
          },
          "response":{
            "json_parser":{
              "reranked_index":"$.result.scores[*].index",
              "relevance_score":"$.result.scores[*].score"
            }
          }
        }
      }
    }
  }
}

POST _inference/rerank/custom_rerank
{
  "input": ["luke", "like", "leia", "chewy","r2d2", "star", "wars"],
  "query": "star wars main character"
}

completion

In the completion module, we demonstrated how to use the task_settings.parameters parameter for more flexible parameter configuration.
To understand completion interface definition for the Alibaba Cloud AI Search completion API, please refer to the official documentation alibaba cloud ai search completion api doc

PUT _inference/completion/custom_completion
{
  "service":"custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":<<your api_key>>
    },
    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
    "path":{
      "/v3/openapi/workspaces/default/text-generation/deepseek-r1":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}"
          },
          "request":{
            "format":"string",
            "content":"{\"messages\":${messages}}"
          },
          "response":{
            "json_parser":{
              "completion_result":"$.result.text"
            }
          }
        }
      }
    }
  }
}

POST _inference/completion/custom_completion
{
  "input":"",
  "task_settings":{
    "parameters":{
      "messages":[
        {
          "role":"system", 
          "content":"你是一个机器人助手"
        },
        {
          "role":"user", 
          "content":"河南的省会是哪里"
        },
        {
          "role":"assistant", 
          "content":"郑州"
        },
        {
          "role":"user", 
          "content":"那里有什么好玩的"
        }
      ]
    }
  }
}

custom

we use query-analyze for example

PUT _inference/custom/query_analyze
{
  "service":"custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":<<your api_key>>
    },
    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
    "path":{
      "/v3/openapi/workspaces/default/query-analyze/ops-query-analyze-001":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}"
          },
          "request":{
            "format":"string",
            "content":"""
              {
                "query":"${query}"
              }
              """
          }
        }
      }
    }
  }
}

POST _inference/custom/query_analyze
{
  "input": "",    // Due to input validation, input cannot be null.
  "query": "what is elasticsearch?"
}

@Huaixinww Huaixinww requested a review from a team as a code owner March 7, 2025 09:54
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.1.0 labels Mar 7, 2025
@elasticsearchmachine elasticsearchmachine added the external-contributor Pull request authored by a developer outside the Elasticsearch team label Mar 7, 2025
@davidkyle davidkyle added :ml Machine learning and removed needs:triage Requires assignment of a team area label labels Mar 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Mar 7, 2025
@Huaixinww Huaixinww changed the title [WIP][Inference API] Add Custom Model support to Inference API [Inference API] Add Custom Model support to Inference API Mar 10, 2025
@davidkyle davidkyle self-assigned this Mar 11, 2025
@davidkyle davidkyle requested review from davidkyle and removed request for a team March 11, 2025 09:52
@davidkyle
Copy link
Member

Thank you @Huaixinww I love the idea of custom models and using JsonPath to configure the response parsing. I'd like to start the review by building a few custom models of my own.

@weizijun
Copy link
Contributor

Thank you @Huaixinww I love the idea of custom models and using JsonPath to configure the response parsing. I'd like to start the review by building a few custom models of my own.

@davidkyle This feature is very useful for our AlibabaCloud user. We will continue improve the feature. And we can discuss how to import this feature to the elasticsearch community.

@davidkyle
Copy link
Member

@Huaixinww and @weizijun the ml team at Elasticsearch love this idea and consider it to be a core feature for the Inference API. Going forward we would like to take on this work in a new PR. I've opted to create a new PR because CI does not run automatically against PRs from external contributors and that will slow the development process.

I've opened #125679 which contains all the commits from this PR plus some fixes for the build system. We will review the PR and add any missing tests etc.

With your permission we would like to make some minor changes:

  1. Rename the service from custom-model to custom
  2. Merge the url and path fields into a single field
    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
    "path":{
      "/v3/openapi/workspaces/default/text-generation/deepseek-r1":{

Would become

    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com/v3/openapi/workspaces/default/text-generation/deepseek-r1"
  1. Add a method field for the HTTP method

With these changes the embedding example from the PR description would look like this:

PUT _inference/text_embedding/custom_embeddings
{
  "service":"custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":"<<your api_key>>"
    },
    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com/v3/openapi/workspaces/default/text-embedding/ops-text-embedding-001",
    "method": "POST",
    "headers":{
      "Authorization": "Bearer ${api_key}",
      "Content-Type": "application/json;charset=utf-8"
    },
    "request":{
      "format":"string",
      "content":"""
        {
          "input":${input}
        }
        """
    },
    "response":{
      "json_parser":{
        "text_embeddings":"$.result.embeddings[*].embedding"
      }
    }
  }
}

@weizijun
Copy link
Contributor

With your permission we would like to make some minor changes:

@davidkyle Our design patten is from openapi. I think your three changes are okay. I think both methods are ok, no problem.

@jonathan-buttner
Copy link
Contributor

I messed up Dave's PR so opened a new one here: #127939

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external-contributor Pull request authored by a developer outside the Elasticsearch team :ml Machine learning Team:ML Meta label for the ML team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants