Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion demos/cuj1.md → demos/cuj1-eks.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ aicr bundle \
--output bundle
```

> Both options allow for comma delamination to supply multiple values. See the [bundle](../docs/user/cli-reference.md#aicr-bundle) section for more information.
> Both options allow for comma-separated values to supply multiple values. See the [bundle](../docs/user/cli-reference.md#aicr-bundle) section for more information.

## Install Bundle into the Cluster

Expand Down
111 changes: 111 additions & 0 deletions demos/cuj2-eks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# AICR - Critical User Journey (CUJ) 2 — EKS Inference

## Assumptions

* Assuming user is already authenticated to an EKS cluster with 2+ H100 (p5.48xlarge) nodes.
* Values used in `--accelerated-node-selector`, `--accelerated-node-toleration`, `--system-node-toleration` flags are only for example purposes. Assuming user will update these to match their cluster.

## Snapshot

```shell
aicr snapshot \
--namespace aicr-validation \
--node-selector nodeGroup=gpu-worker \
--toleration dedicated=worker-workload:NoSchedule \
--toleration dedicated=worker-workload:NoExecute \
--output snapshot.yaml
```

## Gen Recipe

```shell
aicr recipe \
--service eks \
--accelerator h100 \
--intent inference \
--os ubuntu \
--platform dynamo \
--output recipe.yaml
```

## Validate Recipe Constraints

```shell
aicr validate \
--recipe recipe.yaml \
--snapshot snapshot.yaml \
--no-cluster \
--phase deployment \
--output dry-run.json
```

## Generate Bundle

```shell
aicr bundle \
--recipe recipe.yaml \
--accelerated-node-selector nodeGroup=gpu-worker \
--accelerated-node-toleration dedicated=worker-workload:NoSchedule \
--accelerated-node-toleration dedicated=worker-workload:NoExecute \
--system-node-selector nodeGroup=system-worker \
--system-node-toleration dedicated=system-workload:NoSchedule \
--system-node-toleration dedicated=system-workload:NoExecute \
--output bundle
```

> Both options allow for comma-separated values to supply multiple values. See the [bundle](../docs/user/cli-reference.md#aicr-bundle) section for more information.

## Install Bundle into the Cluster

```shell
cd ./bundle && chmod +x deploy.sh && ./deploy.sh
```

## Validate Cluster

```shell
aicr validate \
--recipe recipe.yaml \
--toleration dedicated=worker-workload:NoSchedule \
--toleration dedicated=worker-workload:NoExecute \
--phase all \
--output report.json
```

## Deploy Inference Workload

Deploy an inference serving graph using the Dynamo platform:

```shell
# Deploy the vLLM aggregation workload (includes KAI queue + DynamoGraphDeployment)
kubectl apply -f demos/workloads/inference/vllm-agg.yaml

# Monitor the deployment
kubectl get dynamographdeployments -n dynamo-workload
kubectl get pods -n dynamo-workload -o wide -w

# Verify the inference gateway routes to the workload
kubectl get gateway inference-gateway -n kgateway-system
kubectl get inferencepool -n dynamo-workload
```

## Chat with the Model

Once the workload is running, start a local chat server:

```shell
# Start the chat server (port-forwards to the inference gateway)
bash demos/workloads/inference/chat-server.sh

# Open the chat UI in your browser
open demos/workloads/inference/chat.html
```

## Success

* Bundle deployed with 16 components (inference recipe)
* CNCF conformance: 9/9 requirements pass
* DRA Support, Gang Scheduling, Secure GPU Access, Accelerator Metrics,
AI Service Metrics, Inference Gateway, Robust Controller (Dynamo),
Pod Autoscaling (HPA), Cluster Autoscaling
* Dynamo inference workload serving requests via inference gateway
2 changes: 1 addition & 1 deletion tests/chainsaw/cli/cuj1-training/chainsaw-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ metadata:
spec:
description: |
CUJ1: Critical User Journey - Training Workload.
Tests the complete aicr workflow from demos/cuj1.md:
Tests the complete aicr workflow from demos/cuj1-eks.md:
Step 1: recipe (with --platform kubeflow)
Step 2: validate (deployment phase)
Step 3: bundle (with node scheduling)
Expand Down
2 changes: 1 addition & 1 deletion tests/uat/aws/tests/cuj1-training/chainsaw-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ metadata:
spec:
description: |
UAT CUJ1: Training workload on live EKS cluster with GPU nodes.
Tests the aicr workflow from demos/cuj1.md against a real cluster:
Tests the aicr workflow from demos/cuj1-eks.md against a real cluster:
Step 1: Snapshot the live cluster
Step 2: Generate recipe (EKS/H100/training/kubeflow)
Step 3: Validate deployment against live snapshot
Expand Down
Loading