Skip to content

[ML] Improve how the inference API determines the elser model to use for endpoints #127284

Open
@jonathan-buttner

Description

@jonathan-buttner

When creating an inference endpoint to leverage ELSER, the inference API will determine which model variant to use. To do this it retrieves information about the ML nodes and checks that they're all on the same hardware and which architecture they are using. Based on that information we either use the x86_64 variant or the platform agnostic variant.

There are a couple shortcomings with this:

  • If no ML nodes have to started yet we won't be able to determine the appropriate architecture
  • If the architecture changes the model will crash
  • Ideally the inference API would also handle choosing the right iteration version of the model (currently we use v2)

If the wrong model variant is chosen and it needs to be reevaluated, a workaround is to delete the inference endpoint (if it is the default inference endpoint that is ok too) and recreate it (in the case of the default endpoint it will automatically get recreated).

Metadata

Metadata

Assignees

No one assigned

    Labels

    :mlMachine learningFeature:GenAIFeatures around GenAITeam:MLMeta label for the ML team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions