rhoai-argo

Deployment of Red Hat OpenShift AI (RHOAI) and its required infrastructure stack using Helm and ArgoCD.

Automation Architecture

Sync Wave	Description	Resources
0	Namespaces	All operators
5	RHOAI Dependencies & Utilities	job-set-operator, cma-operator, cert-manager, leader-worker-set, Kueue, SR-IOV, OpenTelemetry, Tempo, ClusterObservability, kmm
7	Configs	cluster-job-set, cma-controller

Checkpoint

10	GPU Dependencies & Hardware Operators	nfd-operator
15	Configs	nfd-instance
20	NVIDIA GPU Operator	gpu-operator
25	GPU Cluster Policy	gpu-clusterpolicy

Checkpoint

30	RHOAI Operator Group + Subscription	rhoai-operator
32	RHOAI Deployment	operator-deployment
33	DataScienceCluster configuration	datasciencecluster
35	RHOAI dashboard configuration	odhdashboardconfig

🚀 Getting Started

To initialize RHOAI, install OpenShift GitOps, configure permissions, and trigger the App-of-Apps deployment.

0. Clone the Repository

git clone https://github.com/redhat-ai-americas/rhoai-argo.git
cd rhoai-argo

1. Prepare OpenShift GitOps (~60 seconds)

Checks if the OpenShift GitOps Subscription exists. If not, applies it and configures permissions for the Service Account once the operator is fully ready.

# 1. Install Operator (only if missing) & Permissions
oc get subscription openshift-gitops-operator -n openshift-operators &>/dev/null || oc apply -f gitops-config/openshift-gitops-subscription.yaml
oc apply -f gitops-config/gitops-permission.yaml

echo "⏳ Finalizing OpenShift GitOps environment..."

# 2. Silent Wait until the Deployment exists
until oc wait deployment/openshift-gitops-server -n openshift-gitops --for=condition=Available --timeout=10s &>/dev/null; do sleep 5; done

# 3. Apply custom health checks and enable the sidebar GitOps tab
oc apply --server-side --force-conflicts -f gitops-config/argocd-instance.yaml

📦 2. Trigger "App-of-Apps" Deployment

Apply the yaml file for our App-of-Apps pattern controlled via sync waves.

Installation (Manual Approval)

Operators will require manual approval for any version upgrades in the OpenShift Console.

oc apply -f app-of-apps.yaml

Approve InstallPlans

Note

You must approve the InstallPlan requests as they attempt to install. This is to avoid automatic updates on your AI workloads. In order to use Automatic updates, first push this to an empty repository and update the app templates to point to git url. Then, change the end of the first line in argocd-applications/values.yaml from Manual to Automatic

In the OpenShift Dashboard, navigate to Home > Search.
The search bar, where it says Resources, type "InstallPlan" select the resource type.
Click on the InstallPlan name in the first column, Installxxxxx > Preview InstallPlan > Approve. (Or use the tip below)
To get back to the list, click on InstallPlans in the path at the top left. Make sure you are on all projects from the namespace dropdown menu at the top of the screen to see all InstallPlans.

Tip

Bulk Approval: To approve all currently waiting InstallPlans at once, run:

oc get installplan -A --no-headers | grep "false" | awk '{print $1, $2}' | xargs -L1 sh -c 'oc patch installplan $1 -n $0 --type merge -p "{\"spec\":{\"approved\":true}}"'

Rolling Approval: To approve all pending and future InstallPlans, run:

oc get installplan -A -w -o custom-columns=NS:.metadata.namespace,NAME:.metadata.name,PHASE:.status.phase --no-headers | while read -r namespace name phase; do
     if [ "$phase" = "RequiresApproval" ]; then
      echo "Detected ready InstallPlan: $name in $namespace. Approving..."
      oc patch installplan "$name" -n "$namespace" --type merge -p '{"spec":{"approved":true}}'
  fi
done

🖥️ 3. Monitor ArgoCD

The ArgoCD dashboard is available via the Waffle Menu in the OpenShift Console header. Alternatively, retrieve the URL directly:

# Get the ArgoCD URL
oc get route openshift-gitops-server -n openshift-gitops -o jsonpath='{.spec.host}'

Wait for the rhoai-deployment ArgoCD application to reach a Healthy state, and... Enjoy using RHOAI!

🛠️ 4. Configure Hardware (Optional)

If your workloads require GPU acceleration, navigate to the hardware-profile/ directory. This folder contains automation scripts and documentation to streamline your GPU setup rather than relying on manual cluster configurations.

Please see the Hardware Profile Guide for step-by-step instructions on how to do any of the following:

Provision GPU Nodes: Automatically create a GPU MachineSet with the correct AWS instance types, labels, and taints.
Install GPU Dashboards: Deploy the NVIDIA DCGM Exporter dashboard directly to your OpenShift web console.
Enable Time-Slicing: Dynamically partition physical GPUs into virtual replicas so multiple workloads can share resources.
Register the Hardware Profile: The final RHOAI dashboard steps to make the GPUs selectable for deployments and workbenches.

RHOAI 3.x Dependencies

Operator	Description
Job-set	Kubernetes-native API for managing groups of jobs as a unit
openshift-custom-metrics-autoscaler	Custom Metrics Autoscaler Operator
cert-manager	Cert-manager Operator for Red Hat OpenShift
Leader Worker Set	LWS-rhel9 Operator
Red Hat Connectivity Link	Unified framework for multicloud connectivity and API management
Kueue (RHBOK)	Red Hat Build of Kueue
SR-IOV	Single Root I/O Virtualization (SR-IOV) Network Operator
GPU Operator	NVIDIA GPU provisioning and management
OpenTelemetry	Vendor-neutral observability framework for metrics, logs, and traces
Tempo	High-scale distributed tracing backend
Cluster Observability	Standalone monitoring stacks for independent service configuration

References

Repository Structure

rhoai-argo/
├── app-of-apps.yaml
├── README.md
├── gitops-config/
│   ├── gitops-permission.yaml
│   └── openshift-gitops-subscription.yaml
├── argocd-applications/
│   ├── Chart.yaml
│   └── templates/
│       ├── gpu-operator.yaml
│       ├── infrastrcuture-operators.yaml
│       ├── observability-operators.yaml
│       ├── rhoai-application.yaml
│       └── scaling-operators.yaml
└── helm/
    ├── rhoai-stack/
    │   ├── Chart.yaml
    │   ├── values.yaml
    │   └── templates/
    │       ├── configs/
    │       │   ├── 32-operator-deployment.yaml
    │       │   ├── 33-datasciencecluster.yaml
    │       │   ├── 35-dashboard-deployment.yaml
    │       │   └── 35-odhdashboardconfig.yaml
    │       └── operators/
    │           └── 30-rhoai-operator.yaml
    ├── workload-scaling/
    │   ├── Chart.yaml
    │   └── templates/
    │       ├── configs/
    │       │   ├── 07-cluster-job-set.yaml
    │       │   └── 07-cma-controller.yaml
    │       └── operators/
    │           ├── 05-cma-operator.yaml
    │           ├── 05-job-set.yaml
    │           ├── 05-leader-worker-set.yaml
    │           └── 05-rhbok.yaml
    ├── gpu-operator-installation/
    │   ├── Chart.yaml
    │   ├── values.yaml
    │   └── templates/
    │       ├── configs/
    │       │   ├── 15-nfd-instance.yaml
    │       │   └── 25-gpu-clusterpolicy.yaml
    │       └── operators/
    │           ├── 10-nfd-operator.yaml
    │           └── 20-gpu-operator.yaml
    ├── infrastructure-utilities/
    │   ├── Chart.yaml
    │   └── templates/
    │       ├── configs/
    │       └── operators/
    │           ├── 05-kmm.yaml
    │           ├── 05-rhcl.yaml
    │           └── 05-sriov.yaml
    └── observability-stack/
        ├── Chart.yaml
        └── templates/
            ├── configs/
            └── operators/
                ├── 05-cluster-observability.yaml
                ├── 05-open-telemetry.yaml
                └── 05-tempo.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rhoai-argo

Automation Architecture

🚀 Getting Started

0. Clone the Repository

1. Prepare OpenShift GitOps (~60 seconds)

📦 2. Trigger "App-of-Apps" Deployment

Installation (Manual Approval)

Approve InstallPlans

🖥️ 3. Monitor ArgoCD

🛠️ 4. Configure Hardware (Optional)

RHOAI 3.x Dependencies

References

Repository Structure

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

rhoai-argo

Automation Architecture

🚀 Getting Started

0. Clone the Repository

1. Prepare OpenShift GitOps (~60 seconds)

📦 2. Trigger "App-of-Apps" Deployment

Installation (Manual Approval)

Approve InstallPlans

🖥️ 3. Monitor ArgoCD

🛠️ 4. Configure Hardware (Optional)

RHOAI 3.x Dependencies

References

Repository Structure