Skip to content

Conversation

@jonathan-buttner
Copy link
Contributor

@jonathan-buttner jonathan-buttner commented Oct 16, 2025

This PR is based on: #136569 Already merged

This PR moves the EIS authorization polling logic to a persistent task on a single node.

Notable changes:

  • It removes the polling logic from occurring on each node
  • A cluster state listener is registered which checks to see if the task exists and if it doesn't, it creates the task
  • If a node running the task shuts down, the persistent task framework handles moving the task to a new node
  • If the EIS url is empty or null, the persistent task will not be created
  • If a cluster is no longer authorized to access certain preconfigured endpoints, the endpoints will remain instead of being removed
  • The polling logic compares the received authorized models with the preconfigured inference endpoints that are already stored in cluster state to determine if any are new. Only new preconfigured inference endpoints are stored
  • The polling logic uses a new action to send the new inference endpoints to the master node to be store. The master node must do this logic because it updates the cluster state

Testing

Start EIS

cd eis-gateway
make TLS_VERIFY_CLIENT_CERTS=false run

Start ES pointing at EIS

run-es -Dtests.es.xpack.inference.elastic.url=https://localhost:8443 -Dtests.es.xpack.inference.elastic.http.ssl.verification_mode=none -Dtests.es.xpack.inference.elastic.authorization_request_interval="5s" -Dtests.es.xpack.inference.elastic.max_authorization_request_jitter="1s"

Retrieve all the endpoints from the inference API should return some EIS endpoints now

GET _inference/_all

A task should be present in the list eis-authorization-poller[c]

GET _tasks

@jonathan-buttner jonathan-buttner added >enhancement :ml Machine learning Team:ML Meta label for the ML team v9.3.0 labels Oct 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @jonathan-buttner, I've created a changelog YAML for you.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use the master for admin tasks that don't actually need to run on the master. If you need a task to run approximately once in the cluster, use a persistent task instead.

@jonathan-buttner jonathan-buttner changed the title [ML] Transition EIS auth polling to master node [ML] Transition EIS auth polling to persistent task on a single node Oct 30, 2025
*/
public class AuthorizationTaskExecutorMultipleNodesIT extends ESIntegTestCase {

private static final String AUTH_TASK_ACTION = AuthorizationPoller.TASK_NAME + "[c]";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find exactly where [c] was being added in the code to include in this concatenation, but it always seems to be there.

import java.util.Objects;

public class Model {
public class Model implements Writeable {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model needs to be serialized now because it's going to be sent to the master node to be stored in the inference index by the ModelRegistry.

@@ -1,357 +0,0 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this because we are no longer revoking endpoints.


public void testStoreModels_ReturnsEmptyList_WhenGivenNoModelsToStore() {
PlainActionFuture<List<ModelRegistry.ModelStoreResponse>> storeListener = new PlainActionFuture<>();
PlainActionFuture<List<ModelStoreResponse>> storeListener = new PlainActionFuture<>();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes should be from move ModelStoreResponse to core.

);
}

private static Map<String, DefaultModelConfig> initDefaultEndpoints(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed because authorization is moved out of this class.

import java.util.Objects;

public abstract class ElasticInferenceServiceModel extends RateLimitGroupingModel {
public class ElasticInferenceServiceModel extends RateLimitGroupingModel {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using this to create a generic instance of the model so we can store them using the ModelRegistry

* Represents the preconfigured endpoints that are included in Elasticsearch. EIS will support dynamic preconfigured endpoints which means
* it can provide new preconfigured endpoints that do not exist in the source here.
*/
public class InternalPreconfiguredEndpoints {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly moved from the ElasticInferenceService.

@@ -1,283 +0,0 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class under test was removed so we don't need it anymore. Most of the functionality was moved to AuthorizationPoller and its test files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cloud-deploy Publish cloud docker image for Cloud-First-Testing >enhancement :ml Machine learning Team:ML Meta label for the ML team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants