merge: resolve conflicts with main (keep homebrew install, our image)

tabern · tabern · commit ec6680661dc6 · 2026-03-11T16:43:40.000-07:00
diff --git a/README.md b/README.md
@@ -23,9 +23,14 @@ Every AICR recipe is:
 Install and generate your first recipe in under two minutes:
 
 ```bash
-# Install the CLI
+# Install the CLI (Homebrew)
+brew tap NVIDIA/aicr
+brew install aicr
+
+# Or use the install script
 curl -sfL https://raw.githubusercontent.com/NVIDIA/aicr/main/install | bash -s --
 
+
 # Capture your cluster's current state
 aicr snapshot --output snapshot.yaml
 
diff --git a/demos/cuj1-eks.md b/demos/cuj1-eks.md
@@ -53,7 +53,7 @@ aicr bundle \
   --output bundle
 ```
 
-> Both options allow for comma delamination to supply multiple values. See the [bundle](../docs/user/cli-reference.md#aicr-bundle) section for more information.
+> Both options allow for comma-separated values to supply multiple values. See the [bundle](../docs/user/cli-reference.md#aicr-bundle) section for more information.
 
 ## Install Bundle into the Cluster
 
diff --git a/demos/cuj2-eks.md b/demos/cuj2-eks.md
@@ -0,0 +1,111 @@
+# AICR - Critical User Journey (CUJ) 2 — EKS Inference
+
+## Assumptions
+
+* Assuming user is already authenticated to an EKS cluster with 2+ H100 (p5.48xlarge) nodes.
+* Values used in `--accelerated-node-selector`, `--accelerated-node-toleration`, `--system-node-toleration` flags are only for example purposes. Assuming user will update these to match their cluster.
+
+## Snapshot
+
+```shell
+aicr snapshot \
+    --namespace aicr-validation \
+    --node-selector nodeGroup=gpu-worker \
+    --toleration dedicated=worker-workload:NoSchedule \
+    --toleration dedicated=worker-workload:NoExecute \
+    --output snapshot.yaml
+```
+
+## Gen Recipe
+
+```shell
+aicr recipe \
+  --service eks \
+  --accelerator h100 \
+  --intent inference \
+  --os ubuntu \
+  --platform dynamo \
+  --output recipe.yaml
+```
+
+## Validate Recipe Constraints
+
+```shell
+aicr validate \
+    --recipe recipe.yaml \
+    --snapshot snapshot.yaml \
+    --no-cluster \
+    --phase deployment \
+    --output dry-run.json
+```
+
+## Generate Bundle
+
+```shell
+aicr bundle \
+  --recipe recipe.yaml \
+  --accelerated-node-selector nodeGroup=gpu-worker \
+  --accelerated-node-toleration dedicated=worker-workload:NoSchedule \
+  --accelerated-node-toleration dedicated=worker-workload:NoExecute \
+  --system-node-selector nodeGroup=system-worker \
+  --system-node-toleration dedicated=system-workload:NoSchedule \
+  --system-node-toleration dedicated=system-workload:NoExecute \
+  --output bundle
+```
+
+> Both options allow for comma-separated values to supply multiple values. See the [bundle](../docs/user/cli-reference.md#aicr-bundle) section for more information.
+
+## Install Bundle into the Cluster
+
+```shell
+cd ./bundle && chmod +x deploy.sh && ./deploy.sh
+```
+
+## Validate Cluster
+
+```shell
+aicr validate \
+    --recipe recipe.yaml \
+    --toleration dedicated=worker-workload:NoSchedule \
+    --toleration dedicated=worker-workload:NoExecute \
+    --phase all \
+    --output report.json
+```
+
+## Deploy Inference Workload
+
+Deploy an inference serving graph using the Dynamo platform:
+
+```shell
+# Deploy the vLLM aggregation workload (includes KAI queue + DynamoGraphDeployment)
+kubectl apply -f demos/workloads/inference/vllm-agg.yaml
+
+# Monitor the deployment
+kubectl get dynamographdeployments -n dynamo-workload
+kubectl get pods -n dynamo-workload -o wide -w
+
+# Verify the inference gateway routes to the workload
+kubectl get gateway inference-gateway -n kgateway-system
+kubectl get inferencepool -n dynamo-workload
+```
+
+## Chat with the Model
+
+Once the workload is running, start a local chat server:
+
+```shell
+# Start the chat server (port-forwards to the inference gateway)
+bash demos/workloads/inference/chat-server.sh
+
+# Open the chat UI in your browser
+open demos/workloads/inference/chat.html
+```
+
+## Success
+
+* Bundle deployed with 16 components (inference recipe)
+* CNCF conformance: 9/9 requirements pass
+  * DRA Support, Gang Scheduling, Secure GPU Access, Accelerator Metrics,
+    AI Service Metrics, Inference Gateway, Robust Controller (Dynamo),
+    Pod Autoscaling (HPA), Cluster Autoscaling
+* Dynamo inference workload serving requests via inference gateway
diff --git a/demos/valid.md b/demos/valid.md
@@ -5,6 +5,11 @@
 Download the latest binary and verify version:
 
 ```shell
+# Homebrew (macOS/Linux)
+brew tap NVIDIA/aicr
+brew install aicr
+
+# Or use the install script
 curl -sfL https://raw.githubusercontent.com/NVIDIA/aicr/main/install | bash -s --
 ```
 
diff --git a/docs/README.md b/docs/README.md
@@ -155,6 +155,11 @@ For engineers integrating AICR into CI/CD pipelines, GitOps workflows, or larger
 ### Install CLI
 
 ```shell
+# Homebrew (macOS/Linux)
+brew tap NVIDIA/aicr
+brew install aicr
+
+# Or use the install script
 curl -sfL https://raw.githubusercontent.com/NVIDIA/aicr/main/install | bash -s --
 ```
 
diff --git a/docs/user/README.md b/docs/user/README.md
@@ -23,7 +23,11 @@ This section is for users who:
 ## Quick Start
 
 ```shell
-# Install aicr CLI
+# Install aicr CLI (Homebrew)
+brew tap NVIDIA/aicr
+brew install aicr
+
+# Or use the install script
 curl -sfL https://raw.githubusercontent.com/NVIDIA/aicr/main/install | bash -s --
 
 # Generate a recipe for your environment
diff --git a/docs/user/installation.md b/docs/user/installation.md
@@ -13,7 +13,14 @@ This guide describes how to install the AI Cluster Runtime (AICR) CLI tool (`aic
 
 ## Install aicr CLI
 
-### Option 1: Automated Installation (Recommended)
+### Option 1: Homebrew (macOS/Linux)
+
+```shell
+brew tap NVIDIA/aicr
+brew install aicr
+```
+
+### Option 2: Automated Installation
 
 Install the latest version using the installation script:
 
@@ -38,7 +45,7 @@ This script:
 
 > **Supply Chain Security**: AICR includes SLSA Build Level 3 compliance with signed SBOMs and verifiable attestations. See [SECURITY](../../SECURITY.md#supply-chain-security) for verification instructions.
 
-### Option 2: Manual Installation
+### Option 3: Manual Installation
 
 1. **Download the latest release**
 
@@ -64,7 +71,7 @@ sudo chmod +x /usr/local/bin/aicr
 aicr --version
 ```
 
-### Option 3: Build from Source
+### Option 4: Build from Source
 
 **Requirements:**
 - Go 1.26 or higher
diff --git a/recipes/overlays/h100-eks-ubuntu-training.yaml b/recipes/overlays/h100-eks-ubuntu-training.yaml
@@ -58,7 +58,7 @@ spec:
         - nccl-all-reduce-bw
       constraints:
         - name: nccl-all-reduce-bw
-          value: ">= 100"
+          value: ">= 300"
     conformance:
       checks:
         - platform-health
@@ -70,4 +70,4 @@ spec:
         - pod-autoscaling
         - cluster-autoscaling
         - robust-controller
-        - secure-accelerator-access
+        - secure-accelerator-access
diff --git a/site/docs/getting-started/index.md b/site/docs/getting-started/index.md
@@ -152,6 +152,11 @@ For engineers integrating AICR into CI/CD pipelines, GitOps workflows, or larger
 ### Install CLI
 
 ```shell
+# Homebrew (macOS/Linux)
+brew tap NVIDIA/aicr
+brew install aicr
+
+# Or use the install script
 curl -sfL https://raw.githubusercontent.com/NVIDIA/aicr/main/install | bash -s --
 ```
 
diff --git a/site/docs/user/index.md b/site/docs/user/index.md
@@ -30,7 +30,11 @@ This section is for users who:
 ## Quick Start
 
 ```shell
-# Install aicr CLI
+# Install aicr CLI (Homebrew)
+brew tap NVIDIA/aicr
+brew install aicr
+
+# Or use the install script
 curl -sfL https://raw.githubusercontent.com/NVIDIA/aicr/main/install | bash -s --
 
 # Generate a recipe for your environment
diff --git a/site/docs/user/installation.md b/site/docs/user/installation.md
@@ -21,7 +21,14 @@ This guide describes how to install the AI Cluster Runtime (AICR) CLI tool (`aic
 
 ## Install aicr CLI
 
-### Option 1: Automated Installation (Recommended)
+### Option 1: Homebrew (macOS/Linux)
+
+```shell
+brew tap NVIDIA/aicr
+brew install aicr
+```
+
+### Option 2: Automated Installation
 
 Install the latest version using the installation script:
 
@@ -46,7 +53,7 @@ This script:
 
 > **Supply Chain Security**: AICR includes SLSA Build Level 3 compliance with signed SBOMs and verifiable attestations. See [SECURITY](/docs/project/security) for verification instructions.
 
-### Option 2: Manual Installation
+### Option 3: Manual Installation
 
 1. **Download the latest release**
 
@@ -72,7 +79,7 @@ sudo chmod +x /usr/local/bin/aicr
 aicr --version
 ```
 
-### Option 3: Build from Source
+### Option 4: Build from Source
 
 **Requirements:**
 - Go 1.25 or higher
diff --git a/tests/chainsaw/cli/cuj1-training/chainsaw-test.yaml b/tests/chainsaw/cli/cuj1-training/chainsaw-test.yaml
@@ -20,7 +20,7 @@ metadata:
 spec:
   description: |
     CUJ1: Critical User Journey - Training Workload.
-    Tests the complete aicr workflow from demos/cuj1.md:
+    Tests the complete aicr workflow from demos/cuj1-eks.md:
       Step 1: recipe (with --platform kubeflow)
       Step 2: validate (deployment phase)
       Step 3: bundle (with node scheduling)
diff --git a/tests/uat/aws/tests/cuj1-training/chainsaw-test.yaml b/tests/uat/aws/tests/cuj1-training/chainsaw-test.yaml
@@ -20,7 +20,7 @@ metadata:
 spec:
   description: |
     UAT CUJ1: Training workload on live EKS cluster with GPU nodes.
-    Tests the aicr workflow from demos/cuj1.md against a real cluster:
+    Tests the aicr workflow from demos/cuj1-eks.md against a real cluster:
       Step 1: Snapshot the live cluster
       Step 2: Generate recipe (EKS/H100/training/kubeflow)
       Step 3: Validate deployment against live snapshot