Deployment of Red Hat OpenShift AI (RHOAI) and its required infrastructure stack using Helm and ArgoCD.
| Sync Wave | Description | Resources |
|---|---|---|
| 0 | Namespaces | All operators |
| 5 | RHOAI Dependencies & Utilities | job-set-operator, cma-operator, cert-manager, leader-worker-set, Kueue, SR-IOV, OpenTelemetry, Tempo, ClusterObservability, kmm |
| 7 | Configs | cluster-job-set, cma-controller |
| Checkpoint | ||
| 10 | GPU Dependencies & Hardware Operators | nfd-operator |
| 15 | Configs | nfd-instance |
| 20 | NVIDIA GPU Operator | gpu-operator |
| 25 | GPU Cluster Policy | gpu-clusterpolicy |
| Checkpoint | ||
| 30 | RHOAI Operator Group + Subscription | rhoai-operator |
| 32 | RHOAI Deployment | operator-deployment |
| 33 | DataScienceCluster configuration | datasciencecluster |
| 35 | RHOAI dashboard configuration | odhdashboardconfig |
To initialize RHOAI, install OpenShift GitOps, configure permissions, and trigger the App-of-Apps deployment.
git clone https://github.com/redhat-ai-americas/rhoai-argo.git
cd rhoai-argo- Checks if the OpenShift GitOps Subscription exists. If not, applies it and configures permissions for the Service Account once the operator is fully ready.
# 1. Install Operator (only if missing) & Permissions
oc get subscription openshift-gitops-operator -n openshift-operators &>/dev/null || oc apply -f gitops-config/openshift-gitops-subscription.yaml
oc apply -f gitops-config/gitops-permission.yaml
echo "⏳ Finalizing OpenShift GitOps environment..."
# 2. Silent Wait until the Deployment exists
until oc wait deployment/openshift-gitops-server -n openshift-gitops --for=condition=Available --timeout=10s &>/dev/null; do sleep 5; done
# 3. Apply custom health checks and enable the sidebar GitOps tab
oc apply --server-side --force-conflicts -f gitops-config/argocd-instance.yaml- Apply the yaml file for our App-of-Apps pattern controlled via sync waves.
- Operators will require manual approval for any version upgrades in the OpenShift Console.
oc apply -f app-of-apps.yamlNote
You must approve the InstallPlan requests as they attempt to install. This is to avoid automatic updates on your AI workloads. In order to use Automatic updates, first push this to an empty repository and update the app templates to point to git url. Then, change the end of the first line in argocd-applications/values.yaml from Manual to Automatic
- In the OpenShift Dashboard, navigate to Home > Search.
- The search bar, where it says Resources, type "InstallPlan" select the resource type.
- Click on the InstallPlan name in the first column, Installxxxxx > Preview InstallPlan > Approve. (Or use the tip below)
- To get back to the list, click on InstallPlans in the path at the top left. Make sure you are on all projects from the namespace dropdown menu at the top of the screen to see all InstallPlans.
Tip
Bulk Approval: To approve all currently waiting InstallPlans at once, run:
oc get installplan -A --no-headers | grep "false" | awk '{print $1, $2}' | xargs -L1 sh -c 'oc patch installplan $1 -n $0 --type merge -p "{\"spec\":{\"approved\":true}}"'Rolling Approval: To approve all pending and future InstallPlans, run:
oc get installplan -A -w -o custom-columns=NS:.metadata.namespace,NAME:.metadata.name,PHASE:.status.phase --no-headers | while read -r namespace name phase; do
if [ "$phase" = "RequiresApproval" ]; then
echo "Detected ready InstallPlan: $name in $namespace. Approving..."
oc patch installplan "$name" -n "$namespace" --type merge -p '{"spec":{"approved":true}}'
fi
done- The ArgoCD dashboard is available via the Waffle Menu in the OpenShift Console header. Alternatively, retrieve the URL directly:
# Get the ArgoCD URL
oc get route openshift-gitops-server -n openshift-gitops -o jsonpath='{.spec.host}'Wait for the rhoai-deployment ArgoCD application to reach a Healthy state, and... Enjoy using RHOAI!
If your workloads require GPU acceleration, navigate to the hardware-profile/ directory. This folder contains automation scripts and documentation to streamline your GPU setup rather than relying on manual cluster configurations.
Please see the Hardware Profile Guide for step-by-step instructions on how to do any of the following:
- Provision GPU Nodes: Automatically create a GPU MachineSet with the correct AWS instance types, labels, and taints.
- Install GPU Dashboards: Deploy the NVIDIA DCGM Exporter dashboard directly to your OpenShift web console.
- Enable Time-Slicing: Dynamically partition physical GPUs into virtual replicas so multiple workloads can share resources.
- Register the Hardware Profile: The final RHOAI dashboard steps to make the GPUs selectable for deployments and workbenches.
| Operator | Description |
|---|---|
| Job-set | Kubernetes-native API for managing groups of jobs as a unit |
| openshift-custom-metrics-autoscaler | Custom Metrics Autoscaler Operator |
| cert-manager | Cert-manager Operator for Red Hat OpenShift |
| Leader Worker Set | LWS-rhel9 Operator |
| Red Hat Connectivity Link | Unified framework for multicloud connectivity and API management |
| Kueue (RHBOK) | Red Hat Build of Kueue |
| SR-IOV | Single Root I/O Virtualization (SR-IOV) Network Operator |
| GPU Operator | NVIDIA GPU provisioning and management |
| OpenTelemetry | Vendor-neutral observability framework for metrics, logs, and traces |
| Tempo | High-scale distributed tracing backend |
| Cluster Observability | Standalone monitoring stacks for independent service configuration |
rhoai-argo/
├── app-of-apps.yaml
├── README.md
├── gitops-config/
│ ├── gitops-permission.yaml
│ └── openshift-gitops-subscription.yaml
├── argocd-applications/
│ ├── Chart.yaml
│ └── templates/
│ ├── gpu-operator.yaml
│ ├── infrastrcuture-operators.yaml
│ ├── observability-operators.yaml
│ ├── rhoai-application.yaml
│ └── scaling-operators.yaml
└── helm/
├── rhoai-stack/
│ ├── Chart.yaml
│ ├── values.yaml
│ └── templates/
│ ├── configs/
│ │ ├── 32-operator-deployment.yaml
│ │ ├── 33-datasciencecluster.yaml
│ │ ├── 35-dashboard-deployment.yaml
│ │ └── 35-odhdashboardconfig.yaml
│ └── operators/
│ └── 30-rhoai-operator.yaml
├── workload-scaling/
│ ├── Chart.yaml
│ └── templates/
│ ├── configs/
│ │ ├── 07-cluster-job-set.yaml
│ │ └── 07-cma-controller.yaml
│ └── operators/
│ ├── 05-cma-operator.yaml
│ ├── 05-job-set.yaml
│ ├── 05-leader-worker-set.yaml
│ └── 05-rhbok.yaml
├── gpu-operator-installation/
│ ├── Chart.yaml
│ ├── values.yaml
│ └── templates/
│ ├── configs/
│ │ ├── 15-nfd-instance.yaml
│ │ └── 25-gpu-clusterpolicy.yaml
│ └── operators/
│ ├── 10-nfd-operator.yaml
│ └── 20-gpu-operator.yaml
├── infrastructure-utilities/
│ ├── Chart.yaml
│ └── templates/
│ ├── configs/
│ └── operators/
│ ├── 05-kmm.yaml
│ ├── 05-rhcl.yaml
│ └── 05-sriov.yaml
└── observability-stack/
├── Chart.yaml
└── templates/
├── configs/
└── operators/
├── 05-cluster-observability.yaml
├── 05-open-telemetry.yaml
└── 05-tempo.yaml