Skip to content

[E&A] Drafts initial conceptual docs for EIS #733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 78 additions & 2 deletions explore-analyze/elastic-inference/eis.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,82 @@ applies_to:
navigation_title: Elastic Inference Service (EIS)
---

# Elastic {{infer-cap}} Service
# Elastic {{infer-cap}} Service [elastic-inference-service-eis]

This is the documentation of the Elastic Inference Service.
The Elastic {{infer-cap}} Service (EIS) enables you to leverage AI-powered search as a service without deploying a model in your cluster.
With EIS, you don't need to manage the infrastructure and resources required for large language models (LLMs) by adding, configuring, and scaling {{ml}} nodes.
Instead, you can use {{ml}} models in high-throughput, low-latency scenarios independently of your {{es}} infrastructure.

% TO DO: Link to the EIS inference endpoint reference docs when it's added to the OpenAPI spec. (Comming soon) %

## Available task types

EIS offers the following {{infer}} task types to perform:

* Chat completion

## How to use EIS [using-eis]

Your Elastic deployment comes with default endpoints for EIS that you can use performing {{infer}} tasks.
You can either do it by calling the {{infer}} API or using the default `Elastic LLM` model in the AI Assistant, Attack Discovery UI, and Search Playground.

% TO DO: Link to the EIS inference endpoint reference docs when it's added to the OpenAPI spec. (Comming soon) %

## Default EIS endpoints [default-eis-inference-endpoints]

Your {{es}} deployment includes a preconfigured EIS endpoint, making it easier to use chat completion via the {{infer}} API:

* `rainbow-sprinkles-elastic`

::::{note}

* The model appears as `Elastic LLM` in the AI Assistant, Attack Discovery UI, preconfigured connectors list, and the Search Playground.

::::

% TO DO: Link to the AI assistant documentation in the different solutions and possibly connector docs. %

## Regions [eis-regions]

EIS runs on AWS in the following regions:

* `us-east-1`
* `us-west-2`

For more details on AWS regions, refer to the [AWS Global Infrastructure](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/) and the [supported cross-region {{infer}} profiles](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html) documentation.

## Examples

The following example demonstrates how to perform a `chat_completion` task through EIS by using the `.rainbow-sprinkles-elastic` default {{infer}} endpoint.

```json
POST /_inference/chat_completion/.rainbow-sprinkles-elastic/_stream
{
"messages": [
{
"role": "user",
"content": "Say yes if it works."
}
],
"temperature": 0.7,
"max_completion_tokens": 300
}
```

The request returns the following response:

```json
(...)
{
"role" : "assistant",
"content": "Yes",
"model" : "rainbow-sprinkles",
"object" : "chat.completion.chunk",
"usage" : {
"completion_tokens" : 4,
"prompt_tokens" : 13,
"total_tokens" : 17
}
}
(...)
```
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,10 @@ Refer to the [{{infer-cap}} APIs](https://www.elastic.co/docs/api/doc/elasticsea

Creates an {{infer}} endpoint to perform an {{infer}} task with the `elastic` service.


## {{api-request-title}} [infer-service-elastic-api-request]

`PUT /_inference/<task_type>/<inference_id>`


## {{api-path-parms-title}} [infer-service-elastic-api-path-params]

`<inference_id>`
Expand All @@ -34,16 +32,13 @@ Creates an {{infer}} endpoint to perform an {{infer}} task with the `elastic` se
* `chat_completion`,
* `sparse_embedding`.


::::{note}
The `chat_completion` task type only supports streaming and only through the `_stream` API.

For more information on how to use the `chat_completion` task type, please refer to the [chat completion documentation](chat-completion-inference-api.md).

::::



## {{api-request-body-title}} [infer-service-elastic-api-request-body]

`max_chunk_size`
Expand All @@ -64,7 +59,6 @@ For more information on how to use the `chat_completion` task type, please refer
`service_settings`
: (Required, object) Settings used to install the {{infer}} model.


`model_id`
: (Required, string) The name of the model to use for the {{infer}} task.

Expand All @@ -77,9 +71,7 @@ For more information on how to use the `chat_completion` task type, please refer
}
```



## Elastic {{infer-cap}} Service example [inference-example-elastic]
## Elastic {{infer-cap}} Service example [inference-example-elastic]

The following example shows how to create an {{infer}} endpoint called `elser-model-eis` to perform a `text_embedding` task type.

Expand All @@ -104,4 +96,3 @@ PUT /_inference/chat_completion/chat-completion-endpoint
}
}
```