Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion website/docs/blueprints/inference/inference-charts.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Both GPU and AWS Neuron (Inferentia/Trainium) accelerators are supported across
Before deploying the inference charts, ensure you have:

- Amazon EKS cluster with GPU or AWS Neuron
nodes ([inference-ready cluster](../../infra/inference-ready-cluster.md) for a quick start)
nodes ([inference-ready cluster](../../infra/solutions/inference-ready-cluster.md) for a quick start)
- Helm 3.0+
- For GPU deployments: NVIDIA device plugin installed
- For Neuron deployments: AWS Neuron device plugin installed
Expand Down
Binary file added website/docs/infra/img/architecture.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions website/docs/infra/solutions/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"label": "Solutions",
"position": 2,
"link": {
"type": "generated-index"
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ sidebar_label: Inference-Ready Cluster

The Inference-Ready EKS Cluster is a pre-configured infrastructure solution designed specifically for AI/ML inference
workloads. This solution provides a Kubernetes cluster with all the necessary components to deploy and run inference
services using the AI on EKS [inference charts](../blueprints/inference/inference-charts.md) or your own deployments
services using the AI on EKS [inference charts](../../blueprints/inference/inference-charts.md) or your own deployments
and models.

An expanded [Readme](https://github.com/awslabs/ai-on-eks/tree/main/infra/solutions/inference-ready-cluster/README.md)
Expand All @@ -21,7 +21,7 @@ requirements, different tools are needed to properly deploy, run, and scale the
models that aren't just LLMs, such as text -> image diffusion models or more traditional Machine Learning models.

This infrastructure is meant to be the first layer of support. Alongside
the [Inference Charts](../blueprints/inference/inference-charts.md) and the Guidance you'll also find in this
the [Inference Charts](../../blueprints/inference/inference-charts.md) and the Guidance you'll also find in this
repository, AI on EKS aims to equip you with all the tools and knowledge you need to be able to run the inference you
want.

Expand Down Expand Up @@ -49,7 +49,7 @@ features:
- **Autoscaling**: Karpenter-based node autoscaling for cost optimization

The cluster is specifically designed to work seamlessly with the AI on
EKS [Inference Charts](../blueprints/inference/inference-charts.md), providing a complete end-to-end solution for
EKS [Inference Charts](../../blueprints/inference/inference-charts.md), providing a complete end-to-end solution for
deploying inference workloads.

## Resources
Expand Down Expand Up @@ -102,7 +102,7 @@ This infrastructure deploys the following AWS resources:

### Architecture Diagram

![architecture](../../../infra/solutions/inference-ready-cluster/image/architecture.jpg)
![architecture](../img/architecture.jpg)

### Prerequisites

Expand Down Expand Up @@ -177,7 +177,7 @@ dashboards are configured to automatically visualize the metrics and logs side b
### Step 7: Cluster Ready

Users can access EKS API and can deploy containerized AI/ML inference workloads via Kubernetes CLI using the AI on EKS
[inference charts](../blueprints/inference/inference-charts.md) or other repositories by interacting with AWS Network
[inference charts](../../blueprints/inference/inference-charts.md) or other repositories by interacting with AWS Network
Load Balancer (NLB) endpoint.

### Step 8: Verify Deployment
Expand Down Expand Up @@ -358,7 +358,7 @@ You should see the following output (expand the section to see the output)
## Inference on EKS

EKS is a powerful platform for running AI/ML inference. For a deep dive on many of the inference possibilities on EKS,
please check the [inference](../blueprints/inference/index.md) section.
please check the [inference](../../blueprints/inference/index.md) section.

### Inference Charts Integration

Expand Down Expand Up @@ -417,7 +417,7 @@ helm install neuron-inference . \
--values values-llama-31-8b-vllm-neuron.yaml
```

Please check the [inference charts](../blueprints/inference/inference-charts.md) section for a deeper look at what is
Please check the [inference charts](../../blueprints/inference/inference-charts.md) section for a deeper look at what is
available.

### Observability Integration
Expand All @@ -435,7 +435,7 @@ Access Grafana dashboard:
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
```

Please see the [observability](../guidance/observability.md) for an in-depth look at using the observability
Please see the [observability](../../guidance/observability.md) for an in-depth look at using the observability
features.

### Cost Optimization
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
sidebar_label: JARK on EKS
sidebar_position: 1
---
import CollapsibleContent from '../../src/components/CollapsibleContent';
import CollapsibleContent from '../../../src/components/CollapsibleContent';

# JARK on EKS

Expand Down Expand Up @@ -32,14 +32,14 @@ JARK comes enabled with [AI/ML Observability](https://github.com/awslabs/ai-ml-o
The JARK stack is ideal for teams and organizations looking to simplify the complex process of deploying and managing AI models. Whether you're working on cutting-edge generative models or scaling existing AI workloads, JARK on Amazon EKS offers the flexibility, scalability, and control you need to succeed.


![alt text](img/jark.png)
![alt text](../img/jark.png)


### Ray on Kubernetes

[Ray](https://www.ray.io/) is an open-source framework for building scalable and distributed applications. It is designed to make it easy to write parallel and distributed Python applications by providing a simple and intuitive API for distributed computing. It has a growing community of users and contributors, and is actively maintained and developed by the Ray team at Anyscale, Inc.

![RayCluster](img/ray-cluster.svg)
![RayCluster](../img/ray-cluster.svg)

*Source: https://docs.ray.io/en/latest/cluster/key-concepts.html*

Expand All @@ -63,15 +63,15 @@ Overall, deploying Ray on Kubernetes can simplify the deployment and management

Before moving forward with the deployment please make sure you have read the pertinent sections of the official [documentation](https://docs.ray.io/en/latest/cluster/kubernetes/index.html).

![RayonK8s](img/ray_on_kubernetes.webp)
![RayonK8s](../img/ray_on_kubernetes.webp)

*Source: https://docs.ray.io/en/latest/cluster/kubernetes/index.html*

<CollapsibleContent header={<h2><span>Deploying the Solution</span></h2>}>

In this [example](https://github.com/awslabs/ai-on-eks/tree/main/infra/jark-stack/terraform), you will provision JARK Cluster on Amazon EKS.

![JARK](img/jark-stack.png)
![JARK](../img/jark-stack.png)


### Prerequisites
Expand Down
7 changes: 7 additions & 0 deletions website/docs/infra/stacks/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"label": "Stacks",
"position": 2,
"link": {
"type": "generated-index"
}
}
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
sidebar_label: AIBrix on EKS
---
import CollapsibleContent from '../../src/components/CollapsibleContent';
import CollapsibleContent from '../../../src/components/CollapsibleContent';

# AIBrix on EKS

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
sidebar_label: EMR NVIDIA Spark-RAPIDS
---
import CollapsibleContent from '../../src/components/CollapsibleContent';
import CollapsibleContent from '../../../src/components/CollapsibleContent';


# EMR on EKS NVIDIA RAPIDS Accelerator for Apache Spark
Expand All @@ -11,7 +11,7 @@ The NVIDIA RAPIDS Accelerator for Apache Spark is a powerful tool that builds on
With the invention of the RAPIDS Accelerator for Spark 3, NVIDIA has successfully revolutionized extract, transform, and load pipelines by significantly enhancing the efficiency of Spark SQL and DataFrame operations. By merging the capabilities of the RAPIDS cuDF library and the extensive reach of the Spark distributed computing ecosystem, the RAPIDS Accelerator for Apache Spark provides a robust solution to handle large-scale computations.
Moreover, the RAPIDS Accelerator library incorporates an advanced shuffle optimized by UCX, which can be configured to support GPU-to-GPU communication and RDMA capabilities, hence further boosting its performance.

![Alt text](img/nvidia.png)
![Alt text](./../img/nvidia.png)

### EMR support for NVIDIA RAPIDS Accelerator for Apache Spark
Integration of Amazon EMR with NVIDIA RAPIDS Accelerator for Apache Spark​ Amazon EMR on EKS now extends its support to include the use of GPU instance types with the NVIDIA RAPIDS Accelerator for Apache Spark. As the use of artificial intelligence (AI) and machine learning (ML) continues to expand in the realm of data analytics, there's an increasing demand for rapid and cost-efficient data processing, which GPUs can provide. The NVIDIA RAPIDS Accelerator for Apache Spark enables users to harness the superior performance of GPUs, leading to substantial infrastructure cost savings.
Expand Down Expand Up @@ -218,7 +218,7 @@ chmod +x execute_spark_rapids_xgboost.sh

Verify the pod status

![Alt text](img/spark-rapids-pod-status.png)
![Alt text](../img/spark-rapids-pod-status.png)


:::info
Expand Down Expand Up @@ -277,7 +277,7 @@ The following is a sample output from the above log file:
**Step7**: Finally, with a trained and validated XGBoost model, you can use it to make predictions on new, unseen loan data. These predictions can help in identifying potential risks associated with loan default or evaluating loan performance.


![Alt text](img/emr-spark-rapids-fannie-mae.png)
![Alt text](../img/emr-spark-rapids-fannie-mae.png)

### GPU Monitoring with DCGM Exporter, Prometheus and Grafana

Expand Down Expand Up @@ -313,7 +313,7 @@ aws secretsmanager get-secret-value --secret-id emr-spark-rapids-grafana --regio

Once logged in, add the AMP datasource to Grafana and import the Open Source GPU monitoring dashboard. You can then explore the metrics and visualize them using the Grafana dashboard, as shown in the screenshot below.

![Alt text](img/gpu-dashboard.png)
![Alt text](../img/gpu-dashboard.png)

<CollapsibleContent header={<h2><span>Cleanup</span></h2>}>

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
sidebar_label: JupyterHub on EKS
---
import CollapsibleContent from '../../src/components/CollapsibleContent';
import CollapsibleContent from '../../../src/components/CollapsibleContent';

:::warning
Deployment of ML models on EKS requires access to GPUs or Neuron instances. If your deployment isn't working, it’s often due to missing access to these resources. Also, some deployment patterns rely on Karpenter autoscaling and static node groups; if nodes aren't initializing, check the logs for Karpenter or Node groups to resolve the issue.
Expand Down Expand Up @@ -64,18 +64,18 @@ When creating the certificate use a wildcard, so that it can secure a domain and
The service generates the private key and self-signed certificate.
Sample prompts to generate a certificate :

![](img/Cert_Install.png)
![certificate](../img/Cert_Install.png)


6. Import the certificate into AWS Certificate Manager

Open the private key(`key.pem`) in a text editor and copy the contents into the private key section of ACM. Similarly, copy the contents of the `certificate.pem` file into the certificate body section and submit.

![](img/ACM.png)
![acm](../img/ACM.png)

Verify certificate is installed correctly in the console in ACM.

![](img/Cert_List.png)
![cert_list](../img/Cert_List.png)

</CollapsibleContent>

Expand Down Expand Up @@ -177,13 +177,13 @@ kubectl port-forward svc/proxy-public 8080:80 -n jupyterhub
```

**Sign-in:** Navigate to [http://localhost:8080/](http://localhost:8080/) in your web browser. Input `user-1` as the username and choose any password.
![alt text](img/image.png)
![alt text](../img/image.png)

Select server options: Upon sign-in, you’ll be presented with a variety of Notebook instance profiles to choose from. The `Data Engineering (CPU)` server is for traditional, CPU based notebook work. The `Elyra` server provides [Elyra](https://github.com/elyra-ai/elyra) functionality, allowing you to quickly develop pipelines: ![workflow](img/elyra-workflow.png). `Trainium` and `Inferentia` servers will deploy the notebook server onto Trainium and Inferentia nodes, allowing accelerated workloads. `Time Slicing` and `MIG` are two different strategies for GPU sharing. Finally, the `Data Science (GPU)` server is a traditional server running on an NVIDIA GPU.
Select server options: Upon sign-in, you’ll be presented with a variety of Notebook instance profiles to choose from. The `Data Engineering (CPU)` server is for traditional, CPU based notebook work. The `Elyra` server provides [Elyra](https://github.com/elyra-ai/elyra) functionality, allowing you to quickly develop pipelines: ![workflow](../img/elyra-workflow.png). `Trainium` and `Inferentia` servers will deploy the notebook server onto Trainium and Inferentia nodes, allowing accelerated workloads. `Time Slicing` and `MIG` are two different strategies for GPU sharing. Finally, the `Data Science (GPU)` server is a traditional server running on an NVIDIA GPU.

For this time-slicing feature demonstration, we’ll be using the **Data Science (GPU + Time-Slicing – G5)** profile. Go ahead and select this option and choose the Start button.

![alt text](img/notebook-server-list.png)
![alt text](../img/notebook-server-list.png)

The new node created by Karpenter with the `g5.2xlarge` instance type has been configured to leverage the timeslicing feature provided by the [NVIDIA device plugin](https://github.com/NVIDIA/k8s-device-plugin). This feature allows for efficient GPU utilization by dividing a single GPU into multiple allocatable units. In this case, we have defined `4` allocatable GPUs in the NVIDIA device plugin Helm chart config map. Below is the status of the node:

Expand Down Expand Up @@ -218,7 +218,7 @@ Open JupyterHub in an Incognito browser window: Navigate to http://localhost:808

Choose server options: After logging in, you’ll see the server options page. Ensure that you select the **Data Science (GPU + Time-Slicing – G5)** radio button and select Start.

![alt text](img/image-2.png)
![alt text](../img/image-2.png)

Verify pod placement: Notice that this pod placement takes only few seconds unlike the `user-1`. It’s because the Kubernetes scheduler is able to place the pod on the existing `g5.2xlarge` node created by the `user-1` pod. `user-2` is also using the same docker image so there is no delay in pulling the docker image and it leveraged local cache.

Expand All @@ -241,29 +241,29 @@ Checkout the [AWS blog: Building multi-tenant JupyterHub Platforms on Amazon EKS

Add the `CNAME` DNS record in ChangeIP for the JupyterHub domain with the load balancer DNS name.

![](img/CNAME.png)
![cname](../img/CNAME.png)

:::info
When adding the load balancer DNS name in the value field of CNAME in ChangeIP make sure to add a dot(`.`) at the end of the load-balancer DNS name.
:::

Now typing the domain url in the browser should redirect to the Jupyterhub login page.

![](img/Cognito-Sign-in.png)
![cognito-sign-in](../img/Cognito-Sign-in.png)


Follow the Cognito sign-up and sign-in process to login.

![](img/Cognito-Sign-up.png)
![cognito-sign-up](../img/Cognito-Sign-up.png)

Successful sign-in will open up the JupyterHub environment for the logged in user.

![](img/jupyter_launcher.png)
![jupyter_launcher](../img/jupyter_launcher.png)

To test the setup of the shared and personal directories in JupyterHub, you can follow these steps:
1. Open a terminal window from the launcher dashboard.

![](img/jupyter_env.png)
![jupyter_env](../img/jupyter_env.png)

2. execute the command

Expand All @@ -278,23 +278,23 @@ Note: This will look a little different depending on your OAuth provider.

Add the `CNAME` DNS record in ChangeIP for the JupyterHub domain with the load balancer DNS name.

![](img/CNAME.png)
![](../img/CNAME.png)

:::info
When adding the load balancer DNS name in the value field of CNAME in ChangeIP make sure to add a dot(`.`) at the end of the load-balancer DNS name.
:::

Now typing the domain url in the browser should redirect to the Jupyterhub login page.

![](img/oauth.png)
![oauth](../img/oauth.png)

Follow the Keycloak sign-up and sign-in process to login.

![](img/keycloak-login.png)
![oauth-login](../img/keycloak-login.png)

Successful sign-in will open up the JupyterHub environment for the logged in user.

![](img/jupyter_launcher.png)
![jupyter_launcher](../img/jupyter_launcher.png)


<CollapsibleContent header={<h3><span>Cleanup</span></h3>}>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
sidebar_label: Trainium on EKS
---
import CollapsibleContent from '../../src/components/CollapsibleContent';
import CollapsibleContent from '../../../src/components/CollapsibleContent';

:::warning
Deployment of ML models on EKS requires access to GPUs or Neuron instances. If your deployment isn't working, it’s often due to missing access to these resources. Also, some deployment patterns rely on Karpenter autoscaling and static node groups; if nodes aren't initializing, check the logs for Karpenter or Node groups to resolve the issue.
Expand All @@ -18,7 +18,7 @@ In this blueprint, we will learn how to securely deploy an [Amazon EKS Cluster](
#### Trianium Device Architecture
Each Trainium device (chip) comprises two neuron cores. In the case of `Trn1.32xlarge` instances, `16 Trainium devices` are combined, resulting in a total of `32 Neuron cores`. The diagram below provides a visual representation of the Neuron device's architecture:

![Trainium Device](img/neuron-device.png)
![Trainium Device](../img/neuron-device.png)

#### AWS Neuron Drivers
Neuron Drivers are a set of essential software components installed on the host operating system of AWS Inferentia-based accelerators, such as Trainium/Inferentia instances. Their primary function is to optimize the interaction between the accelerator hardware and the underlying operating system, ensuring seamless communication and efficient utilization of the accelerator's computational capabilities.
Expand Down Expand Up @@ -46,7 +46,7 @@ TorchX can seamlessly integrate with Airflow and Kubeflow Pipelines. In this blu

### Solution Architecture

![Alt text](img/trainium-on-eks-arch.png)
![Alt text](../img/trainium-on-eks-arch.png)


<CollapsibleContent header={<h2><span>Deploying the Solution</span></h2>}>
Expand Down Expand Up @@ -199,7 +199,7 @@ chmod +x 2-bert-pretrain-precompile.sh

You can verify the pods status by running `kubectl get pods` or `kubectl get vcjob`. Successful output looks like below.

![Alt text](img/pre-compile-pod-status.png)
![Alt text](../img/pre-compile-pod-status.png)

You can also verify the logs for the Pod once they are `Succeeded`. The precompilation job will run for `~15 minutes`. Once complete, you will see the following in the output:

Expand All @@ -215,7 +215,7 @@ Compiler status PASS

New pre-training cache files are stored under FSx for Lustre.

![Alt text](img/cache.png)
![Alt text](../img/cache.png)


#### Step4: Launch BERT pretraining job using 64 Neuron cores with two trn1.32xlarge instances
Expand Down Expand Up @@ -268,7 +268,7 @@ instance-id: i-04b476a6a0e686980

You can also run `neuron-top` which provides the live usage of neuron cores. The below shows the usage of all 32 neuron cores.

![Alt text](img/neuron-top.png)
![Alt text](../img/neuron-top.png)


If you wish to terminate the job, you can execute the following commands:
Expand Down