This guide provides instructions for deploying the MaaS Platform infrastructure and applications.
- ODH/RHOAI requirements:
- KServe enabled in DataScienceCluster
- Service Mesh installed (automatically installed with ODH/RHOAI)
- This project assumes OpenDataHub (ODH) or Red Hat OpenShift AI (RHOAI) as the base platform
- KServe components are expected to be provided by ODH/RHOAI, not installed separately
- For non-ODH/RHOAI deployments, KServe can be optionally installed from
deployment/components/kserve
For OpenShift clusters, use the automated deployment script:
./deployment/scripts/deploy-openshift.shThis script handles all steps including feature gates, dependencies, and OpenShift-specific configurations.
On newer OpenShift versions (4.19.9+), Gateway API is enabled by creating the GatewayClass resource. Skip to Step 1.
Enable Gateway API features manually:
oc patch featuregate/cluster --type='merge' \
-p '{"spec":{"featureSet":"CustomNoUpgrade","customNoUpgrade":{"enabled":["GatewayAPI","GatewayAPIController"]}}}'Wait for the cluster operators to reconcile (this may take a few minutes).
Note
The kserve namespace is managed by ODH/RHOAI and should not be created manually.
for ns in kuadrant-system llm maas-api; do
kubectl create namespace $ns || true
doneInstall required operators and CRDs. Note that KServe is provided by ODH/RHOAI on OpenShift.
./deployment/scripts/install-dependencies.sh \
--cert-manager \
--kuadrantChoose your platform:
export CLUSTER_DOMAIN="apps.your-openshift-cluster.com"
kustomize build deployment/overlays/openshift | envsubst | kubectl apply -f -export CLUSTER_DOMAIN="your-kubernetes-domain.com"
kustomize build deployment/overlays/kubernetes | envsubst | kubectl apply -f -Note
These models use KServe's LLMInferenceService custom resource, which requires ODH/RHOAI with KServe enabled.
PROJECT_DIR=$(git rev-parse --show-toplevel)
kustomize build ${PROJECT_DIR}/docs/samples/models/simulator/ | kubectl apply -f -PROJECT_DIR=$(git rev-parse --show-toplevel)
kustomize build ${PROJECT_DIR}/docs/samples/models/facebook-opt-125m-cpu/ | kubectl apply -f -Warning
This model requires GPU nodes with nvidia.com/gpu resources available in your cluster.
PROJECT_DIR=$(git rev-parse --show-toplevel)
kustomize build ${PROJECT_DIR}/docs/samples/models/qwen3/ | kubectl apply -f -# Check LLMInferenceService status
kubectl get llminferenceservices -n llm
# Check pods
kubectl get pods -n llmIf installed via Helm:
kubectl -n kuadrant-system patch deployment kuadrant-operator-controller-manager \
--type='json' \
-p='[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"ISTIO_GATEWAY_CONTROLLER_NAMES","value":"openshift.io/gateway-controller/v1"}}]'Important
After the Gateway becomes ready, restart the Kuadrant operators to ensure policies are properly enforced.
Wait for Gateway to be ready:
kubectl wait --for=condition=Programmed gateway openshift-ai-inference -n openshift-ingress --timeout=300sThen restart Kuadrant operators:
kubectl rollout restart deployment/kuadrant-operator-controller-manager -n kuadrant-system
kubectl rollout restart deployment/authorino-operator -n kuadrant-system
kubectl rollout restart deployment/limitador-operator-controller-manager -n kuadrant-systemIf installed via OLM:
kubectl patch csv kuadrant-operator.v0.0.0 -n kuadrant-system --type='json' -p='[
{
"op": "add",
"path": "/spec/install/spec/deployments/0/spec/template/spec/containers/0/env/-",
"value": {
"name": "ISTIO_GATEWAY_CONTROLLER_NAMES",
"value": "openshift.io/gateway-controller/v1"
}
}
]'kubectl -n kserve patch configmap inferenceservice-config \
--type='json' \
-p="[{
\"op\": \"replace\",
\"path\": \"/data/ingress\",
\"value\": \"{\\\"enableGatewayApi\\\": true, \\\"kserveIngressGateway\\\": \\\"openshift-ingress/openshift-ai-inference\\\", \\\"ingressGateway\\\": \\\"istio-system/istio-ingressgateway\\\", \\\"ingressDomain\\\": \\\"$(kubectl get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')\\\"}\"
}]"Update Limitador to expose metrics properly:
kubectl -n kuadrant-system patch limitador limitador --type merge \
-p '{"spec":{"image":"quay.io/kuadrant/limitador:1a28eac1b42c63658a291056a62b5d940596fd4c","version":""}}'Patch AuthPolicy with the correct audience for Openshift Identities:
PROJECT_DIR=$(git rev-parse --show-toplevel)
AUD="$(kubectl create token default --duration=10m \
| jwt decode --json - \
| jq -r '.payload.aud[0]')"
echo "Patching AuthPolicy with audience: $AUD"
kubectl patch authpolicy maas-api-auth-policy -n maas-api \
--type='json' \
-p "$(jq -nc --arg aud "$AUD" '[{
op:"replace",
path:"/spec/rules/authentication/openshift-identities/kubernetesTokenReview/audiences/0",
value:$aud
}]')"
For OpenShift:
HOST="$(kubectl get gateway openshift-ai-inference -n openshift-ingress -o jsonpath='{.status.addresses[0].value}')"For Kubernetes with LoadBalancer:
HOST="$(kubectl get gateway openshift-ai-inference -n openshift-ingress -o jsonpath='{.status.addresses[0].value}')"For OpenShift:
TOKEN_RESPONSE=$(curl -sSk \
-H "Authorization: Bearer $(oc whoami -t)" \
-H "Content-Type: application/json" \
-X POST \
-d '{"expiration": "10m"}' \
"${HOST}/maas-api/v1/tokens")
TOKEN=$(echo $TOKEN_RESPONSE | jq -r .token)For OpenShift deployments, first get the gateway route:
MODELS=$(curl ${HOST}/maas-api/v1/models \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" | jq . -r)
echo $MODELS | jq .
MODEL_URL=$(echo $MODELS | jq -r '.data[0].url')
MODEL_NAME=$(echo $MODELS | jq -r '.data[0].id')
echo $MODEL_URLSend multiple requests to trigger rate limit:
for i in {1..16}
do
curl -sSk -o /dev/null -w "%{http_code}\n" \
-H "Authorization: Bearer $TOKEN" \
-d "{
\"model\": \"${MODEL_NAME}\",
\"prompt\": \"Not really understood prompt\",
\"max_prompts\": 40
}" \
"${MODEL_URL}/v1/chat/completions";
doneCheck that all components are running:
kubectl get pods -n maas-api
kubectl get pods -n kuadrant-system
kubectl get pods -n kserve
kubectl get pods -n llmCheck Gateway status:
kubectl get gateway -n openshift-ingress openshift-ai-inferenceCheck that policies are enforced:
kubectl get authpolicy -A
kubectl get tokenratelimitpolicy -A
# Check LLMInferenceServices are ready
kubectl get llminferenceservices -n llmAfter deployment, the following services are available:
Access models through the gateway route for proper token rate limiting:
-
MaaS API:
https://maas-api.${CLUSTER_DOMAIN}- For token generation and management
- Direct route to MaaS API service
-
Gateway (for Models):
https://gateway.${CLUSTER_DOMAIN}- Simulator:
https://gateway.${CLUSTER_DOMAIN}/simulator/v1/chat/completions - Qwen3:
https://gateway.${CLUSTER_DOMAIN}/qwen3/v1/chat/completions - All model access MUST go through the gateway for rate limiting
- Simulator:
Check all relevant pods:
kubectl get pods -A | grep -E "maas-api|kserve|kuadrant|simulator|qwen"Check services:
kubectl get svc -A | grep -E "maas-api|simulator|qwen"Check HTTPRoutes and Gateway:
kubectl get httproute -A
kubectl get gateway -AView MaaS API logs:
kubectl logs -n maas-api -l app=maas-api --tail=50View Kuadrant logs:
kubectl logs -n kuadrant-system -l app=kuadrant --tail=50View Model logs:
kubectl logs -n llm -l component=predictor --tail=50- OOMKilled during model download: Increase storage initializer memory limits
- GPU models not scheduling: Ensure nodes have
nvidia.com/gpuresources - Rate limiting not working: Verify AuthPolicy and TokenRateLimitPolicy are applied
- Routes not accessible: Check Gateway status and HTTPRoute configuration
- Kuadrant installation fails with CRD errors: The deployment script now automatically cleans up leftover CRDs from previous installations
- TokenRateLimitPolicy MissingDependency error:
- Symptom: TokenRateLimitPolicy shows status "token rate limit policy validation has not finished"
- Fix: Run
./scripts/fix-token-rate-limit-policy.shor manually restart:kubectl rollout restart deployment kuadrant-operator-controller-manager -n kuadrant-system kubectl rollout restart deployment/authorino -n kuadrant-system
- Note: This is a known Kuadrant issue that may occur after initial deployment
- Gateway stuck in "Waiting for controller" on OpenShift:
- Symptom: Gateway shows "Waiting for controller" indefinitely
- Expected behavior: Creating the GatewayClass should automatically trigger Service Mesh installation
- If automatic installation doesn't work:
- Install Red Hat OpenShift Service Mesh operator from OperatorHub manually
- Create a Service Mesh control plane (Istio instance):
cat <<EOF | kubectl apply -f - apiVersion: sailoperator.io/v1 kind: Istio metadata: name: openshift-gateway spec: version: v1.26.4 namespace: openshift-ingress EOF
- Note: This is typically only needed on non-RHOAI OpenShift clusters
After deploying the infrastructure:
- Start the development environment: See the main README for frontend/backend setup
- Deploy additional models: Check samples/models for more examples
- Configure monitoring: Enable observability components in overlays