awslabs · ronaldngounou · Oct 8, 2025 · Oct 8, 2025
diff --git a/website/docs/blueprints/inference/inference-charts.md b/website/docs/blueprints/inference/inference-charts.md
@@ -28,7 +28,7 @@ Both GPU and AWS Neuron (Inferentia/Trainium) accelerators are supported across
 Before deploying the inference charts, ensure you have:
 
 - Amazon EKS cluster with GPU or AWS Neuron
-  nodes ([inference-ready cluster](../../infra/inference-ready-cluster.md) for a quick start)
+  nodes ([inference-ready cluster](../../infra/solutions/inference-ready-cluster.md) for a quick start)
 - Helm 3.0+
 - For GPU deployments: NVIDIA device plugin installed
 - For Neuron deployments: AWS Neuron device plugin installed

diff --git a/website/docs/infra/img/architecture.jpg b/website/docs/infra/img/architecture.jpg
diff --git a/website/docs/infra/solutions/_category_.json b/website/docs/infra/solutions/_category_.json
@@ -0,0 +1,7 @@
+{
+    "label": "Solutions",
+    "position": 2,
+    "link": {
+        "type": "generated-index"
+    }
+}
diff --git a/...ite/docs/infra/inference-ready-cluster.md → ...nfra/solutions/inference-ready-cluster.md b/...ite/docs/infra/inference-ready-cluster.md → ...nfra/solutions/inference-ready-cluster.md
@@ -6,7 +6,7 @@ sidebar_label: Inference-Ready Cluster
 
 The Inference-Ready EKS Cluster is a pre-configured infrastructure solution designed specifically for AI/ML inference
 workloads. This solution provides a Kubernetes cluster with all the necessary components to deploy and run inference
-services using the AI on EKS [inference charts](../blueprints/inference/inference-charts.md) or your own deployments
+services using the AI on EKS [inference charts](../../blueprints/inference/inference-charts.md) or your own deployments
 and models.
 
 An expanded [Readme](https://github.com/awslabs/ai-on-eks/tree/main/infra/solutions/inference-ready-cluster/README.md)
@@ -21,7 +21,7 @@ requirements, different tools are needed to properly deploy, run, and scale the
 models that aren't just LLMs, such as text -> image diffusion models or more traditional Machine Learning models.
 
 This infrastructure is meant to be the first layer of support. Alongside
-the [Inference Charts](../blueprints/inference/inference-charts.md) and the Guidance you'll also find in this
+the [Inference Charts](../../blueprints/inference/inference-charts.md) and the Guidance you'll also find in this
 repository, AI on EKS aims to equip you with all the tools and knowledge you need to be able to run the inference you
 want.
 
@@ -49,7 +49,7 @@ features:
 - **Autoscaling**: Karpenter-based node autoscaling for cost optimization
 
 The cluster is specifically designed to work seamlessly with the AI on
-EKS [Inference Charts](../blueprints/inference/inference-charts.md), providing a complete end-to-end solution for
+EKS [Inference Charts](../../blueprints/inference/inference-charts.md), providing a complete end-to-end solution for
 deploying inference workloads.
 
 ## Resources
@@ -102,7 +102,7 @@ This infrastructure deploys the following AWS resources:
 
 ### Architecture Diagram
 
-![architecture](../../../infra/solutions/inference-ready-cluster/image/architecture.jpg)
+![architecture](../img/architecture.jpg)
 
 ### Prerequisites
 
@@ -177,7 +177,7 @@ dashboards are configured to automatically visualize the metrics and logs side b
 ### Step 7: Cluster Ready
 
 Users can access EKS API and can deploy containerized AI/ML inference workloads via Kubernetes CLI using the AI on EKS
-[inference charts](../blueprints/inference/inference-charts.md) or other repositories by interacting with AWS Network
+[inference charts](../../blueprints/inference/inference-charts.md) or other repositories by interacting with AWS Network
 Load Balancer (NLB) endpoint.
 
 ### Step 8: Verify Deployment
@@ -358,7 +358,7 @@ You should see the following output (expand the section to see the output)
 ## Inference on EKS
 
 EKS is a powerful platform for running AI/ML inference. For a deep dive on many of the inference possibilities on EKS,
-please check the [inference](../blueprints/inference/index.md) section.
+please check the [inference](../../blueprints/inference/index.md) section.
 
 ### Inference Charts Integration
 
@@ -417,7 +417,7 @@ helm install neuron-inference . \
   --values values-llama-31-8b-vllm-neuron.yaml
 ```
 
-Please check the [inference charts](../blueprints/inference/inference-charts.md) section for a deeper look at what is
+Please check the [inference charts](../../blueprints/inference/inference-charts.md) section for a deeper look at what is
 available.
 
 ### Observability Integration
@@ -435,7 +435,7 @@ Access Grafana dashboard:
 kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
 ```
 
-Please see the [observability](../guidance/observability.md) for an in-depth look at using the observability
+Please see the [observability](../../guidance/observability.md) for an in-depth look at using the observability
 features.
 
 ### Cost Optimization

diff --git a/website/docs/infra/jark.md → website/docs/infra/solutions/jark.md b/website/docs/infra/jark.md → website/docs/infra/solutions/jark.md
@@ -2,7 +2,7 @@
 sidebar_label: JARK on EKS
 sidebar_position: 1
 ---
-import CollapsibleContent from '../../src/components/CollapsibleContent';
+import CollapsibleContent from '../../../src/components/CollapsibleContent';
 
 # JARK on EKS
 
@@ -32,14 +32,14 @@ JARK comes enabled with [AI/ML Observability](https://github.com/awslabs/ai-ml-o
 The JARK stack is ideal for teams and organizations looking to simplify the complex process of deploying and managing AI models. Whether you're working on cutting-edge generative models or scaling existing AI workloads, JARK on Amazon EKS offers the flexibility, scalability, and control you need to succeed.
 
 
-![alt text](img/jark.png)
+![alt text](../img/jark.png)
 
 
 ### Ray on Kubernetes
 
 [Ray](https://www.ray.io/) is an open-source framework for building scalable and distributed applications. It is designed to make it easy to write parallel and distributed Python applications by providing a simple and intuitive API for distributed computing. It has a growing community of users and contributors, and is actively maintained and developed by the Ray team at Anyscale, Inc.
 
-![RayCluster](img/ray-cluster.svg)
+![RayCluster](../img/ray-cluster.svg)
 
 *Source: https://docs.ray.io/en/latest/cluster/key-concepts.html*
 
@@ -63,15 +63,15 @@ Overall, deploying Ray on Kubernetes can simplify the deployment and management
 
 Before moving forward with the deployment please make sure you have read the pertinent sections of the official [documentation](https://docs.ray.io/en/latest/cluster/kubernetes/index.html).
 
-![RayonK8s](img/ray_on_kubernetes.webp)
+![RayonK8s](../img/ray_on_kubernetes.webp)
 
 *Source: https://docs.ray.io/en/latest/cluster/kubernetes/index.html*
 
 <CollapsibleContent header={<h2><span>Deploying the Solution</span></h2>}>
 
 In this [example](https://github.com/awslabs/ai-on-eks/tree/main/infra/jark-stack/terraform), you will provision JARK Cluster on Amazon EKS.
 
-![JARK](img/jark-stack.png)
+![JARK](../img/jark-stack.png)
 
 
 ### Prerequisites

diff --git a/website/docs/infra/stacks/_category_.json b/website/docs/infra/stacks/_category_.json
@@ -0,0 +1,7 @@
+{
+    "label": "Stacks",
+    "position": 2,
+    "link": {
+        "type": "generated-index"
+    }
+}
diff --git a/website/docs/infra/aibrix.md → website/docs/infra/stacks/aibrix.md b/website/docs/infra/aibrix.md → website/docs/infra/stacks/aibrix.md
@@ -1,7 +1,7 @@
 ---
 sidebar_label: AIBrix on EKS
 ---
-import CollapsibleContent from '../../src/components/CollapsibleContent';
+import CollapsibleContent from '../../../src/components/CollapsibleContent';
 
 # AIBrix on EKS
 

diff --git a/website/docs/infra/emr-spark-rapids.md → ...ite/docs/infra/stacks/emr-spark-rapids.md b/website/docs/infra/emr-spark-rapids.md → ...ite/docs/infra/stacks/emr-spark-rapids.md
@@ -1,7 +1,7 @@
 ---
 sidebar_label: EMR NVIDIA Spark-RAPIDS
 ---
-import CollapsibleContent from '../../src/components/CollapsibleContent';
+import CollapsibleContent from '../../../src/components/CollapsibleContent';
 
 
 # EMR on EKS NVIDIA RAPIDS Accelerator for Apache Spark
@@ -11,7 +11,7 @@ The NVIDIA RAPIDS Accelerator for Apache Spark is a powerful tool that builds on
 With the invention of the RAPIDS Accelerator for Spark 3, NVIDIA has successfully revolutionized extract, transform, and load pipelines by significantly enhancing the efficiency of Spark SQL and DataFrame operations. By merging the capabilities of the RAPIDS cuDF library and the extensive reach of the Spark distributed computing ecosystem, the RAPIDS Accelerator for Apache Spark provides a robust solution to handle large-scale computations.
 Moreover, the RAPIDS Accelerator library incorporates an advanced shuffle optimized by UCX, which can be configured to support GPU-to-GPU communication and RDMA capabilities, hence further boosting its performance.
 
-![Alt text](img/nvidia.png)
+![Alt text](./../img/nvidia.png)
 
 ### EMR support for NVIDIA RAPIDS Accelerator for Apache Spark
 Integration of Amazon EMR with NVIDIA RAPIDS Accelerator for Apache Spark Amazon EMR on EKS now extends its support to include the use of GPU instance types with the NVIDIA RAPIDS Accelerator for Apache Spark. As the use of artificial intelligence (AI) and machine learning (ML) continues to expand in the realm of data analytics, there's an increasing demand for rapid and cost-efficient data processing, which GPUs can provide. The NVIDIA RAPIDS Accelerator for Apache Spark enables users to harness the superior performance of GPUs, leading to substantial infrastructure cost savings.
@@ -218,7 +218,7 @@ chmod +x execute_spark_rapids_xgboost.sh
 
 Verify the pod status
 
-![Alt text](img/spark-rapids-pod-status.png)
+![Alt text](../img/spark-rapids-pod-status.png)
 
 
 :::info
@@ -277,7 +277,7 @@ The following is a sample output from the above log file:
 **Step7**: Finally, with a trained and validated XGBoost model, you can use it to make predictions on new, unseen loan data. These predictions can help in identifying potential risks associated with loan default or evaluating loan performance.
 
 
-![Alt text](img/emr-spark-rapids-fannie-mae.png)
+![Alt text](../img/emr-spark-rapids-fannie-mae.png)
 
 ###  GPU Monitoring with DCGM Exporter, Prometheus and Grafana
 
@@ -313,7 +313,7 @@ aws secretsmanager get-secret-value --secret-id emr-spark-rapids-grafana --regio
 
 Once logged in, add the AMP datasource to Grafana and import the Open Source GPU monitoring dashboard. You can then explore the metrics and visualize them using the Grafana dashboard, as shown in the screenshot below.
 
-![Alt text](img/gpu-dashboard.png)
+![Alt text](../img/gpu-dashboard.png)
 
 <CollapsibleContent header={<h2><span>Cleanup</span></h2>}>
 

diff --git a/website/docs/infra/jupyterhub.md → website/docs/infra/stacks/jupyterhub.md b/website/docs/infra/jupyterhub.md → website/docs/infra/stacks/jupyterhub.md
@@ -1,7 +1,7 @@
 ---
 sidebar_label: JupyterHub on EKS
 ---
-import CollapsibleContent from '../../src/components/CollapsibleContent';
+import CollapsibleContent from '../../../src/components/CollapsibleContent';
 
 :::warning
 Deployment of ML models on EKS requires access to GPUs or Neuron instances. If your deployment isn't working, it’s often due to missing access to these resources. Also, some deployment patterns rely on Karpenter autoscaling and static node groups; if nodes aren't initializing, check the logs for Karpenter or Node groups to resolve the issue.
@@ -64,18 +64,18 @@ When creating the certificate use a wildcard, so that it can secure a domain and
 The service generates the private key and self-signed certificate.
 Sample prompts to generate a certificate :
 
-![](img/Cert_Install.png)
+![certificate](../img/Cert_Install.png)
 
 
 6. Import the certificate into AWS Certificate Manager
 
 Open the private key(`key.pem`) in a text editor and copy the contents into the private key section of ACM. Similarly, copy the contents of the `certificate.pem` file into the certificate body section and submit.
 
-   ![](img/ACM.png)
+   ![acm](../img/ACM.png)
 
    Verify certificate is installed correctly in the console in ACM.
 
-   ![](img/Cert_List.png)
+   ![cert_list](../img/Cert_List.png)
 
 </CollapsibleContent>
 
@@ -177,13 +177,13 @@ kubectl port-forward svc/proxy-public 8080:80 -n jupyterhub
 ```
 
 **Sign-in:** Navigate to [http://localhost:8080/](http://localhost:8080/) in your web browser. Input `user-1` as the username and choose any password.
-![alt text](img/image.png)
+![alt text](../img/image.png)
 
-Select server options: Upon sign-in, you’ll be presented with a variety of Notebook instance profiles to choose from. The `Data Engineering (CPU)` server is for traditional, CPU based notebook work. The `Elyra` server provides [Elyra](https://github.com/elyra-ai/elyra) functionality, allowing you to quickly develop pipelines: ![workflow](img/elyra-workflow.png). `Trainium` and `Inferentia` servers will deploy the notebook server onto Trainium and Inferentia nodes, allowing accelerated workloads. `Time Slicing` and `MIG` are two different strategies for GPU sharing. Finally, the `Data Science (GPU)` server is a traditional server running on an NVIDIA GPU.
+Select server options: Upon sign-in, you’ll be presented with a variety of Notebook instance profiles to choose from. The `Data Engineering (CPU)` server is for traditional, CPU based notebook work. The `Elyra` server provides [Elyra](https://github.com/elyra-ai/elyra) functionality, allowing you to quickly develop pipelines: ![workflow](../img/elyra-workflow.png). `Trainium` and `Inferentia` servers will deploy the notebook server onto Trainium and Inferentia nodes, allowing accelerated workloads. `Time Slicing` and `MIG` are two different strategies for GPU sharing. Finally, the `Data Science (GPU)` server is a traditional server running on an NVIDIA GPU.
 
 For this time-slicing feature demonstration, we’ll be using the **Data Science (GPU + Time-Slicing – G5)** profile. Go ahead and select this option and choose the Start button.
 
-![alt text](img/notebook-server-list.png)
+![alt text](../img/notebook-server-list.png)
 
 The new node created by Karpenter with the `g5.2xlarge` instance type has been configured to leverage the timeslicing feature provided by the [NVIDIA device plugin](https://github.com/NVIDIA/k8s-device-plugin). This feature allows for efficient GPU utilization by dividing a single GPU into multiple allocatable units. In this case, we have defined `4` allocatable GPUs in the NVIDIA device plugin Helm chart config map. Below is the status of the node:
 
@@ -218,7 +218,7 @@ Open JupyterHub in an Incognito browser window: Navigate to http://localhost:808
 
 Choose server options: After logging in, you’ll see the server options page. Ensure that you select the **Data Science (GPU + Time-Slicing – G5)** radio button and select Start.
 
-![alt text](img/image-2.png)
+![alt text](../img/image-2.png)
 
 Verify pod placement: Notice that this pod placement takes only few seconds unlike the `user-1`. It’s because the Kubernetes scheduler is able to place the pod on the existing `g5.2xlarge` node created by the `user-1` pod. `user-2` is also using the same docker image so there is no delay in pulling the docker image and it leveraged local cache.
 
@@ -241,29 +241,29 @@ Checkout the [AWS blog: Building multi-tenant JupyterHub Platforms on Amazon EKS
 
 Add the `CNAME` DNS record in ChangeIP for the JupyterHub domain with the load balancer DNS name.
 
-![](img/CNAME.png)
+![cname](../img/CNAME.png)
 
 :::info
 When adding the load balancer DNS name in the value field of CNAME in ChangeIP make sure to add a dot(`.`) at the end of the load-balancer DNS name.
 :::
 
 Now typing the domain url in the browser should redirect to the Jupyterhub login page.
 
-![](img/Cognito-Sign-in.png)
+![cognito-sign-in](../img/Cognito-Sign-in.png)
 
 
 Follow the Cognito sign-up and sign-in process to login.
 
-![](img/Cognito-Sign-up.png)
+![cognito-sign-up](../img/Cognito-Sign-up.png)
 
 Successful sign-in will open up the JupyterHub environment for the logged in user.
 
-![](img/jupyter_launcher.png)
+![jupyter_launcher](../img/jupyter_launcher.png)
 
 To test the setup of the shared and personal directories in JupyterHub, you can follow these steps:
 1. Open a terminal window from the launcher dashboard.
 
-![](img/jupyter_env.png)
+![jupyter_env](../img/jupyter_env.png)
 
 2.  execute the command
 
@@ -278,23 +278,23 @@ Note: This will look a little different depending on your OAuth provider.
 
 Add the `CNAME` DNS record in ChangeIP for the JupyterHub domain with the load balancer DNS name.
 
-![](img/CNAME.png)
+![](../img/CNAME.png)
 
 :::info
 When adding the load balancer DNS name in the value field of CNAME in ChangeIP make sure to add a dot(`.`) at the end of the load-balancer DNS name.
 :::
 
 Now typing the domain url in the browser should redirect to the Jupyterhub login page.
 
-![](img/oauth.png)
+![oauth](../img/oauth.png)
 
 Follow the Keycloak sign-up and sign-in process to login.
 
-![](img/keycloak-login.png)
+![oauth-login](../img/keycloak-login.png)
 
 Successful sign-in will open up the JupyterHub environment for the logged in user.
 
-![](img/jupyter_launcher.png)
+![jupyter_launcher](../img/jupyter_launcher.png)
 
 
 <CollapsibleContent header={<h3><span>Cleanup</span></h3>}>

diff --git a/website/docs/infra/trainium.md → website/docs/infra/stacks/trainium.md b/website/docs/infra/trainium.md → website/docs/infra/stacks/trainium.md
@@ -1,7 +1,7 @@
 ---
 sidebar_label: Trainium on EKS
 ---
-import CollapsibleContent from '../../src/components/CollapsibleContent';
+import CollapsibleContent from '../../../src/components/CollapsibleContent';
 
 :::warning
 Deployment of ML models on EKS requires access to GPUs or Neuron instances. If your deployment isn't working, it’s often due to missing access to these resources. Also, some deployment patterns rely on Karpenter autoscaling and static node groups; if nodes aren't initializing, check the logs for Karpenter or Node groups to resolve the issue.
@@ -18,7 +18,7 @@ In this blueprint, we will learn how to securely deploy an [Amazon EKS Cluster](
 #### Trianium Device Architecture
 Each Trainium device (chip) comprises two neuron cores. In the case of `Trn1.32xlarge` instances, `16 Trainium devices` are combined, resulting in a total of `32 Neuron cores`. The diagram below provides a visual representation of the Neuron device's architecture:
 
-![Trainium Device](img/neuron-device.png)
+![Trainium Device](../img/neuron-device.png)
 
 #### AWS Neuron Drivers
 Neuron Drivers are a set of essential software components installed on the host operating system of AWS Inferentia-based accelerators, such as Trainium/Inferentia instances. Their primary function is to optimize the interaction between the accelerator hardware and the underlying operating system, ensuring seamless communication and efficient utilization of the accelerator's computational capabilities.
@@ -46,7 +46,7 @@ TorchX can seamlessly integrate with Airflow and Kubeflow Pipelines. In this blu
 
 ### Solution Architecture
 
-![Alt text](img/trainium-on-eks-arch.png)
+![Alt text](../img/trainium-on-eks-arch.png)
 
 
 <CollapsibleContent header={<h2><span>Deploying the Solution</span></h2>}>
@@ -199,7 +199,7 @@ chmod +x 2-bert-pretrain-precompile.sh
 
 You can verify the pods status by running `kubectl get pods` or `kubectl get vcjob`. Successful output looks like below.
 
-![Alt text](img/pre-compile-pod-status.png)
+![Alt text](../img/pre-compile-pod-status.png)
 
 You can also verify the logs for the Pod once they are `Succeeded`. The precompilation job will run for `~15 minutes`. Once complete, you will see the following in the output:
 
@@ -215,7 +215,7 @@ Compiler status PASS
 
 New pre-training cache files are stored under FSx for Lustre.
 
-![Alt text](img/cache.png)
+![Alt text](../img/cache.png)
 
 
 #### Step4: Launch BERT pretraining job using 64 Neuron cores with two trn1.32xlarge instances
@@ -268,7 +268,7 @@ instance-id: i-04b476a6a0e686980
 
 You can also run `neuron-top` which provides the live usage of neuron cores. The below shows the usage of all 32 neuron cores.
 
-![Alt text](img/neuron-top.png)
+![Alt text](../img/neuron-top.png)
 
 
 If you wish to terminate the job, you can execute the following commands: