[ML] Improve how the inference API determines the elser model to use for endpoints

When creating an inference endpoint to leverage ELSER, the inference API will determine which model variant to use. To do this it retrieves information about the ML nodes and checks that they're all on the same hardware and which architecture they are using. Based on that information we either use the x86_64 variant or the platform agnostic variant.

There are a couple shortcomings with this:
- If no ML nodes have to started yet we won't be able to determine the appropriate architecture
- If the architecture changes the model will crash
- Ideally the inference API would also handle choosing the right iteration version of the model (currently we use v2)

If the wrong model variant is chosen and it needs to be reevaluated, a workaround is to delete the inference endpoint (if it is the default inference endpoint that is ok too) and recreate it (in the case of the default endpoint it will automatically get recreated).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Improve how the inference API determines the elser model to use for endpoints #127284

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ML] Improve how the inference API determines the elser model to use for endpoints #127284

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions