From 6a85c84f0443a389e0e153dbaa0dc4780481a6d5 Mon Sep 17 00:00:00 2001
From: Yuan Chen <yuanchen97@gmail.com>
Date: Wed, 11 Mar 2026 16:11:27 -0700
Subject: [PATCH] docs: rename cuj1.md to cuj1-eks.md and add cuj2-eks.md for
 inference
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Rename demos/cuj1.md → demos/cuj1-eks.md to distinguish from GKE variant
- Add demos/cuj2-eks.md: EKS inference CUJ with Dynamo platform, including
  recipe generation, bundle deployment, CNCF conformance evidence collection,
  vLLM workload deployment, and chat UI
---
 demos/{cuj1.md => cuj1-eks.md}                |   2 +-
 demos/cuj2-eks.md                             | 111 ++++++++++++++++++
 .../cli/cuj1-training/chainsaw-test.yaml      |   2 +-
 .../tests/cuj1-training/chainsaw-test.yaml    |   2 +-
 4 files changed, 114 insertions(+), 3 deletions(-)
 rename demos/{cuj1.md => cuj1-eks.md} (94%)
 create mode 100644 demos/cuj2-eks.md

diff --git a/demos/cuj1.md b/demos/cuj1-eks.md
similarity index 94%
rename from demos/cuj1.md
rename to demos/cuj1-eks.md
index d858bc80c..eda8f9481 100644
--- a/demos/cuj1.md
+++ b/demos/cuj1-eks.md
@@ -53,7 +53,7 @@ aicr bundle \
   --output bundle
 ```
 
-> Both options allow for comma delamination to supply multiple values. See the [bundle](../docs/user/cli-reference.md#aicr-bundle) section for more information.
+> Both options allow for comma-separated values to supply multiple values. See the [bundle](../docs/user/cli-reference.md#aicr-bundle) section for more information.
 
 ## Install Bundle into the Cluster
 
diff --git a/demos/cuj2-eks.md b/demos/cuj2-eks.md
new file mode 100644
index 000000000..e36b4bb3a
--- /dev/null
+++ b/demos/cuj2-eks.md
@@ -0,0 +1,111 @@
+# AICR - Critical User Journey (CUJ) 2 — EKS Inference
+
+## Assumptions
+
+* Assuming user is already authenticated to an EKS cluster with 2+ H100 (p5.48xlarge) nodes.
+* Values used in `--accelerated-node-selector`, `--accelerated-node-toleration`, `--system-node-toleration` flags are only for example purposes. Assuming user will update these to match their cluster.
+
+## Snapshot
+
+```shell
+aicr snapshot \
+    --namespace aicr-validation \
+    --node-selector nodeGroup=gpu-worker \
+    --toleration dedicated=worker-workload:NoSchedule \
+    --toleration dedicated=worker-workload:NoExecute \
+    --output snapshot.yaml
+```
+
+## Gen Recipe
+
+```shell
+aicr recipe \
+  --service eks \
+  --accelerator h100 \
+  --intent inference \
+  --os ubuntu \
+  --platform dynamo \
+  --output recipe.yaml
+```
+
+## Validate Recipe Constraints
+
+```shell
+aicr validate \
+    --recipe recipe.yaml \
+    --snapshot snapshot.yaml \
+    --no-cluster \
+    --phase deployment \
+    --output dry-run.json
+```
+
+## Generate Bundle
+
+```shell
+aicr bundle \
+  --recipe recipe.yaml \
+  --accelerated-node-selector nodeGroup=gpu-worker \
+  --accelerated-node-toleration dedicated=worker-workload:NoSchedule \
+  --accelerated-node-toleration dedicated=worker-workload:NoExecute \
+  --system-node-selector nodeGroup=system-worker \
+  --system-node-toleration dedicated=system-workload:NoSchedule \
+  --system-node-toleration dedicated=system-workload:NoExecute \
+  --output bundle
+```
+
+> Both options allow for comma-separated values to supply multiple values. See the [bundle](../docs/user/cli-reference.md#aicr-bundle) section for more information.
+
+## Install Bundle into the Cluster
+
+```shell
+cd ./bundle && chmod +x deploy.sh && ./deploy.sh
+```
+
+## Validate Cluster
+
+```shell
+aicr validate \
+    --recipe recipe.yaml \
+    --toleration dedicated=worker-workload:NoSchedule \
+    --toleration dedicated=worker-workload:NoExecute \
+    --phase all \
+    --output report.json
+```
+
+## Deploy Inference Workload
+
+Deploy an inference serving graph using the Dynamo platform:
+
+```shell
+# Deploy the vLLM aggregation workload (includes KAI queue + DynamoGraphDeployment)
+kubectl apply -f demos/workloads/inference/vllm-agg.yaml
+
+# Monitor the deployment
+kubectl get dynamographdeployments -n dynamo-workload
+kubectl get pods -n dynamo-workload -o wide -w
+
+# Verify the inference gateway routes to the workload
+kubectl get gateway inference-gateway -n kgateway-system
+kubectl get inferencepool -n dynamo-workload
+```
+
+## Chat with the Model
+
+Once the workload is running, start a local chat server:
+
+```shell
+# Start the chat server (port-forwards to the inference gateway)
+bash demos/workloads/inference/chat-server.sh
+
+# Open the chat UI in your browser
+open demos/workloads/inference/chat.html
+```
+
+## Success
+
+* Bundle deployed with 16 components (inference recipe)
+* CNCF conformance: 9/9 requirements pass
+  * DRA Support, Gang Scheduling, Secure GPU Access, Accelerator Metrics,
+    AI Service Metrics, Inference Gateway, Robust Controller (Dynamo),
+    Pod Autoscaling (HPA), Cluster Autoscaling
+* Dynamo inference workload serving requests via inference gateway
diff --git a/tests/chainsaw/cli/cuj1-training/chainsaw-test.yaml b/tests/chainsaw/cli/cuj1-training/chainsaw-test.yaml
index 31255af1c..b36be4d80 100644
--- a/tests/chainsaw/cli/cuj1-training/chainsaw-test.yaml
+++ b/tests/chainsaw/cli/cuj1-training/chainsaw-test.yaml
@@ -20,7 +20,7 @@ metadata:
 spec:
   description: |
     CUJ1: Critical User Journey - Training Workload.
-    Tests the complete aicr workflow from demos/cuj1.md:
+    Tests the complete aicr workflow from demos/cuj1-eks.md:
       Step 1: recipe (with --platform kubeflow)
       Step 2: validate (deployment phase)
       Step 3: bundle (with node scheduling)
diff --git a/tests/uat/aws/tests/cuj1-training/chainsaw-test.yaml b/tests/uat/aws/tests/cuj1-training/chainsaw-test.yaml
index 9f45e2fe7..4642bdb11 100644
--- a/tests/uat/aws/tests/cuj1-training/chainsaw-test.yaml
+++ b/tests/uat/aws/tests/cuj1-training/chainsaw-test.yaml
@@ -20,7 +20,7 @@ metadata:
 spec:
   description: |
     UAT CUJ1: Training workload on live EKS cluster with GPU nodes.
-    Tests the aicr workflow from demos/cuj1.md against a real cluster:
+    Tests the aicr workflow from demos/cuj1-eks.md against a real cluster:
       Step 1: Snapshot the live cluster
       Step 2: Generate recipe (EKS/H100/training/kubeflow)
       Step 3: Validate deployment against live snapshot