From 6a85c84f0443a389e0e153dbaa0dc4780481a6d5 Mon Sep 17 00:00:00 2001 From: Yuan Chen Date: Wed, 11 Mar 2026 16:11:27 -0700 Subject: [PATCH] docs: rename cuj1.md to cuj1-eks.md and add cuj2-eks.md for inference MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Rename demos/cuj1.md → demos/cuj1-eks.md to distinguish from GKE variant - Add demos/cuj2-eks.md: EKS inference CUJ with Dynamo platform, including recipe generation, bundle deployment, CNCF conformance evidence collection, vLLM workload deployment, and chat UI --- demos/{cuj1.md => cuj1-eks.md} | 2 +- demos/cuj2-eks.md | 111 ++++++++++++++++++ .../cli/cuj1-training/chainsaw-test.yaml | 2 +- .../tests/cuj1-training/chainsaw-test.yaml | 2 +- 4 files changed, 114 insertions(+), 3 deletions(-) rename demos/{cuj1.md => cuj1-eks.md} (94%) create mode 100644 demos/cuj2-eks.md diff --git a/demos/cuj1.md b/demos/cuj1-eks.md similarity index 94% rename from demos/cuj1.md rename to demos/cuj1-eks.md index d858bc80c..eda8f9481 100644 --- a/demos/cuj1.md +++ b/demos/cuj1-eks.md @@ -53,7 +53,7 @@ aicr bundle \ --output bundle ``` -> Both options allow for comma delamination to supply multiple values. See the [bundle](../docs/user/cli-reference.md#aicr-bundle) section for more information. +> Both options allow for comma-separated values to supply multiple values. See the [bundle](../docs/user/cli-reference.md#aicr-bundle) section for more information. ## Install Bundle into the Cluster diff --git a/demos/cuj2-eks.md b/demos/cuj2-eks.md new file mode 100644 index 000000000..e36b4bb3a --- /dev/null +++ b/demos/cuj2-eks.md @@ -0,0 +1,111 @@ +# AICR - Critical User Journey (CUJ) 2 — EKS Inference + +## Assumptions + +* Assuming user is already authenticated to an EKS cluster with 2+ H100 (p5.48xlarge) nodes. +* Values used in `--accelerated-node-selector`, `--accelerated-node-toleration`, `--system-node-toleration` flags are only for example purposes. Assuming user will update these to match their cluster. + +## Snapshot + +```shell +aicr snapshot \ + --namespace aicr-validation \ + --node-selector nodeGroup=gpu-worker \ + --toleration dedicated=worker-workload:NoSchedule \ + --toleration dedicated=worker-workload:NoExecute \ + --output snapshot.yaml +``` + +## Gen Recipe + +```shell +aicr recipe \ + --service eks \ + --accelerator h100 \ + --intent inference \ + --os ubuntu \ + --platform dynamo \ + --output recipe.yaml +``` + +## Validate Recipe Constraints + +```shell +aicr validate \ + --recipe recipe.yaml \ + --snapshot snapshot.yaml \ + --no-cluster \ + --phase deployment \ + --output dry-run.json +``` + +## Generate Bundle + +```shell +aicr bundle \ + --recipe recipe.yaml \ + --accelerated-node-selector nodeGroup=gpu-worker \ + --accelerated-node-toleration dedicated=worker-workload:NoSchedule \ + --accelerated-node-toleration dedicated=worker-workload:NoExecute \ + --system-node-selector nodeGroup=system-worker \ + --system-node-toleration dedicated=system-workload:NoSchedule \ + --system-node-toleration dedicated=system-workload:NoExecute \ + --output bundle +``` + +> Both options allow for comma-separated values to supply multiple values. See the [bundle](../docs/user/cli-reference.md#aicr-bundle) section for more information. + +## Install Bundle into the Cluster + +```shell +cd ./bundle && chmod +x deploy.sh && ./deploy.sh +``` + +## Validate Cluster + +```shell +aicr validate \ + --recipe recipe.yaml \ + --toleration dedicated=worker-workload:NoSchedule \ + --toleration dedicated=worker-workload:NoExecute \ + --phase all \ + --output report.json +``` + +## Deploy Inference Workload + +Deploy an inference serving graph using the Dynamo platform: + +```shell +# Deploy the vLLM aggregation workload (includes KAI queue + DynamoGraphDeployment) +kubectl apply -f demos/workloads/inference/vllm-agg.yaml + +# Monitor the deployment +kubectl get dynamographdeployments -n dynamo-workload +kubectl get pods -n dynamo-workload -o wide -w + +# Verify the inference gateway routes to the workload +kubectl get gateway inference-gateway -n kgateway-system +kubectl get inferencepool -n dynamo-workload +``` + +## Chat with the Model + +Once the workload is running, start a local chat server: + +```shell +# Start the chat server (port-forwards to the inference gateway) +bash demos/workloads/inference/chat-server.sh + +# Open the chat UI in your browser +open demos/workloads/inference/chat.html +``` + +## Success + +* Bundle deployed with 16 components (inference recipe) +* CNCF conformance: 9/9 requirements pass + * DRA Support, Gang Scheduling, Secure GPU Access, Accelerator Metrics, + AI Service Metrics, Inference Gateway, Robust Controller (Dynamo), + Pod Autoscaling (HPA), Cluster Autoscaling +* Dynamo inference workload serving requests via inference gateway diff --git a/tests/chainsaw/cli/cuj1-training/chainsaw-test.yaml b/tests/chainsaw/cli/cuj1-training/chainsaw-test.yaml index 31255af1c..b36be4d80 100644 --- a/tests/chainsaw/cli/cuj1-training/chainsaw-test.yaml +++ b/tests/chainsaw/cli/cuj1-training/chainsaw-test.yaml @@ -20,7 +20,7 @@ metadata: spec: description: | CUJ1: Critical User Journey - Training Workload. - Tests the complete aicr workflow from demos/cuj1.md: + Tests the complete aicr workflow from demos/cuj1-eks.md: Step 1: recipe (with --platform kubeflow) Step 2: validate (deployment phase) Step 3: bundle (with node scheduling) diff --git a/tests/uat/aws/tests/cuj1-training/chainsaw-test.yaml b/tests/uat/aws/tests/cuj1-training/chainsaw-test.yaml index 9f45e2fe7..4642bdb11 100644 --- a/tests/uat/aws/tests/cuj1-training/chainsaw-test.yaml +++ b/tests/uat/aws/tests/cuj1-training/chainsaw-test.yaml @@ -20,7 +20,7 @@ metadata: spec: description: | UAT CUJ1: Training workload on live EKS cluster with GPU nodes. - Tests the aicr workflow from demos/cuj1.md against a real cluster: + Tests the aicr workflow from demos/cuj1-eks.md against a real cluster: Step 1: Snapshot the live cluster Step 2: Generate recipe (EKS/H100/training/kubeflow) Step 3: Validate deployment against live snapshot