Skip to content

Commit afd515f

Browse files
authored
docs: rename cuj1.md to cuj1-eks.md and add cuj2-eks.md for inference (#358)
1 parent ca0551d commit afd515f

File tree

4 files changed

+114
-3
lines changed

4 files changed

+114
-3
lines changed

demos/cuj1.md renamed to demos/cuj1-eks.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ aicr bundle \
5353
--output bundle
5454
```
5555

56-
> Both options allow for comma delamination to supply multiple values. See the [bundle](../docs/user/cli-reference.md#aicr-bundle) section for more information.
56+
> Both options allow for comma-separated values to supply multiple values. See the [bundle](../docs/user/cli-reference.md#aicr-bundle) section for more information.
5757
5858
## Install Bundle into the Cluster
5959

demos/cuj2-eks.md

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# AICR - Critical User Journey (CUJ) 2 — EKS Inference
2+
3+
## Assumptions
4+
5+
* Assuming user is already authenticated to an EKS cluster with 2+ H100 (p5.48xlarge) nodes.
6+
* Values used in `--accelerated-node-selector`, `--accelerated-node-toleration`, `--system-node-toleration` flags are only for example purposes. Assuming user will update these to match their cluster.
7+
8+
## Snapshot
9+
10+
```shell
11+
aicr snapshot \
12+
--namespace aicr-validation \
13+
--node-selector nodeGroup=gpu-worker \
14+
--toleration dedicated=worker-workload:NoSchedule \
15+
--toleration dedicated=worker-workload:NoExecute \
16+
--output snapshot.yaml
17+
```
18+
19+
## Gen Recipe
20+
21+
```shell
22+
aicr recipe \
23+
--service eks \
24+
--accelerator h100 \
25+
--intent inference \
26+
--os ubuntu \
27+
--platform dynamo \
28+
--output recipe.yaml
29+
```
30+
31+
## Validate Recipe Constraints
32+
33+
```shell
34+
aicr validate \
35+
--recipe recipe.yaml \
36+
--snapshot snapshot.yaml \
37+
--no-cluster \
38+
--phase deployment \
39+
--output dry-run.json
40+
```
41+
42+
## Generate Bundle
43+
44+
```shell
45+
aicr bundle \
46+
--recipe recipe.yaml \
47+
--accelerated-node-selector nodeGroup=gpu-worker \
48+
--accelerated-node-toleration dedicated=worker-workload:NoSchedule \
49+
--accelerated-node-toleration dedicated=worker-workload:NoExecute \
50+
--system-node-selector nodeGroup=system-worker \
51+
--system-node-toleration dedicated=system-workload:NoSchedule \
52+
--system-node-toleration dedicated=system-workload:NoExecute \
53+
--output bundle
54+
```
55+
56+
> Both options allow for comma-separated values to supply multiple values. See the [bundle](../docs/user/cli-reference.md#aicr-bundle) section for more information.
57+
58+
## Install Bundle into the Cluster
59+
60+
```shell
61+
cd ./bundle && chmod +x deploy.sh && ./deploy.sh
62+
```
63+
64+
## Validate Cluster
65+
66+
```shell
67+
aicr validate \
68+
--recipe recipe.yaml \
69+
--toleration dedicated=worker-workload:NoSchedule \
70+
--toleration dedicated=worker-workload:NoExecute \
71+
--phase all \
72+
--output report.json
73+
```
74+
75+
## Deploy Inference Workload
76+
77+
Deploy an inference serving graph using the Dynamo platform:
78+
79+
```shell
80+
# Deploy the vLLM aggregation workload (includes KAI queue + DynamoGraphDeployment)
81+
kubectl apply -f demos/workloads/inference/vllm-agg.yaml
82+
83+
# Monitor the deployment
84+
kubectl get dynamographdeployments -n dynamo-workload
85+
kubectl get pods -n dynamo-workload -o wide -w
86+
87+
# Verify the inference gateway routes to the workload
88+
kubectl get gateway inference-gateway -n kgateway-system
89+
kubectl get inferencepool -n dynamo-workload
90+
```
91+
92+
## Chat with the Model
93+
94+
Once the workload is running, start a local chat server:
95+
96+
```shell
97+
# Start the chat server (port-forwards to the inference gateway)
98+
bash demos/workloads/inference/chat-server.sh
99+
100+
# Open the chat UI in your browser
101+
open demos/workloads/inference/chat.html
102+
```
103+
104+
## Success
105+
106+
* Bundle deployed with 16 components (inference recipe)
107+
* CNCF conformance: 9/9 requirements pass
108+
* DRA Support, Gang Scheduling, Secure GPU Access, Accelerator Metrics,
109+
AI Service Metrics, Inference Gateway, Robust Controller (Dynamo),
110+
Pod Autoscaling (HPA), Cluster Autoscaling
111+
* Dynamo inference workload serving requests via inference gateway

tests/chainsaw/cli/cuj1-training/chainsaw-test.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ metadata:
2020
spec:
2121
description: |
2222
CUJ1: Critical User Journey - Training Workload.
23-
Tests the complete aicr workflow from demos/cuj1.md:
23+
Tests the complete aicr workflow from demos/cuj1-eks.md:
2424
Step 1: recipe (with --platform kubeflow)
2525
Step 2: validate (deployment phase)
2626
Step 3: bundle (with node scheduling)

tests/uat/aws/tests/cuj1-training/chainsaw-test.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ metadata:
2020
spec:
2121
description: |
2222
UAT CUJ1: Training workload on live EKS cluster with GPU nodes.
23-
Tests the aicr workflow from demos/cuj1.md against a real cluster:
23+
Tests the aicr workflow from demos/cuj1-eks.md against a real cluster:
2424
Step 1: Snapshot the live cluster
2525
Step 2: Generate recipe (EKS/H100/training/kubeflow)
2626
Step 3: Validate deployment against live snapshot

0 commit comments

Comments
 (0)