Deployment Guide

This guide provides instructions for deploying the MaaS Platform infrastructure and applications.

Prerequisites

ODH/RHOAI requirements:
- KServe enabled in DataScienceCluster
- Service Mesh installed (automatically installed with ODH/RHOAI)

Important Notes

This project assumes OpenDataHub (ODH) or Red Hat OpenShift AI (RHOAI) as the base platform
KServe components are expected to be provided by ODH/RHOAI, not installed separately
For non-ODH/RHOAI deployments, KServe can be optionally installed from deployment/components/kserve

Quick Start

Automated OpenShift Deployment (Recommended)

For OpenShift clusters, use the automated deployment script:

./deployment/scripts/deploy-openshift.sh

This script handles all steps including feature gates, dependencies, and OpenShift-specific configurations.

Manual Deployment Steps

Step 0: Enable Gateway API Features (OpenShift Only)

For OpenShift 4.19.9+

On newer OpenShift versions (4.19.9+), Gateway API is enabled by creating the GatewayClass resource. Skip to Step 1.

For OpenShift < 4.19.9

Enable Gateway API features manually:

oc patch featuregate/cluster --type='merge' \
  -p '{"spec":{"featureSet":"CustomNoUpgrade","customNoUpgrade":{"enabled":["GatewayAPI","GatewayAPIController"]}}}'

Wait for the cluster operators to reconcile (this may take a few minutes).

Step 1: Create Namespaces

Note

The kserve namespace is managed by ODH/RHOAI and should not be created manually.

for ns in kuadrant-system llm maas-api; do 
  kubectl create namespace $ns || true
done

Step 2: Install Dependencies

Install required operators and CRDs. Note that KServe is provided by ODH/RHOAI on OpenShift.

./deployment/scripts/install-dependencies.sh \
  --cert-manager \
  --kuadrant

Step 3: Deploy Core Infrastructure

Choose your platform:

OpenShift Deployment

export CLUSTER_DOMAIN="apps.your-openshift-cluster.com"
kustomize build deployment/overlays/openshift | envsubst | kubectl apply -f -

Kubernetes Deployment

export CLUSTER_DOMAIN="your-kubernetes-domain.com"
kustomize build deployment/overlays/kubernetes | envsubst | kubectl apply -f -

Step 4: Deploy Sample Models (Optional)

Note

These models use KServe's LLMInferenceService custom resource, which requires ODH/RHOAI with KServe enabled.

Simulator Model (CPU)

PROJECT_DIR=$(git rev-parse --show-toplevel)
kustomize build ${PROJECT_DIR}/docs/samples/models/simulator/ | kubectl apply -f -

Facebook OPT-125M Model (CPU)

PROJECT_DIR=$(git rev-parse --show-toplevel)
kustomize build ${PROJECT_DIR}/docs/samples/models/facebook-opt-125m-cpu/  | kubectl apply -f -

Qwen3 Model (GPU Required)

Warning

This model requires GPU nodes with nvidia.com/gpu resources available in your cluster.

PROJECT_DIR=$(git rev-parse --show-toplevel)
kustomize build ${PROJECT_DIR}/docs/samples/models/qwen3/ | kubectl apply -f -

Verify Model Deployment

# Check LLMInferenceService status
kubectl get llminferenceservices -n llm

# Check pods
kubectl get pods -n llm

Platform-Specific Configuration

OpenShift Configuration

Patch Kuadrant for OpenShift Gateway Controller

If installed via Helm:

kubectl -n kuadrant-system patch deployment kuadrant-operator-controller-manager \
  --type='json' \
  -p='[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"ISTIO_GATEWAY_CONTROLLER_NAMES","value":"openshift.io/gateway-controller/v1"}}]'

Important

After the Gateway becomes ready, restart the Kuadrant operators to ensure policies are properly enforced.

Wait for Gateway to be ready:

kubectl wait --for=condition=Programmed gateway openshift-ai-inference -n openshift-ingress --timeout=300s

Then restart Kuadrant operators:

kubectl rollout restart deployment/kuadrant-operator-controller-manager -n kuadrant-system
kubectl rollout restart deployment/authorino-operator -n kuadrant-system
kubectl rollout restart deployment/limitador-operator-controller-manager -n kuadrant-system

If installed via OLM:

kubectl patch csv kuadrant-operator.v0.0.0 -n kuadrant-system --type='json' -p='[
  {
    "op": "add",
    "path": "/spec/install/spec/deployments/0/spec/template/spec/containers/0/env/-",
    "value": {
      "name": "ISTIO_GATEWAY_CONTROLLER_NAMES",
      "value": "openshift.io/gateway-controller/v1"
    }
  }
]'

Update KServe Ingress Domain

kubectl -n kserve patch configmap inferenceservice-config \
  --type='json' \
  -p="[{
    \"op\": \"replace\",
    \"path\": \"/data/ingress\",
    \"value\": \"{\\\"enableGatewayApi\\\": true, \\\"kserveIngressGateway\\\": \\\"openshift-ingress/openshift-ai-inference\\\", \\\"ingressGateway\\\": \\\"istio-system/istio-ingressgateway\\\", \\\"ingressDomain\\\": \\\"$(kubectl get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')\\\"}\"
  }]"

Update Limitador Image for Metrics (Optional but Recommended)

Update Limitador to expose metrics properly:

kubectl -n kuadrant-system patch limitador limitador --type merge \
  -p '{"spec":{"image":"quay.io/kuadrant/limitador:1a28eac1b42c63658a291056a62b5d940596fd4c","version":""}}'

Ensure the correct audience is set for AuthPolicy

Patch AuthPolicy with the correct audience for Openshift Identities:

PROJECT_DIR=$(git rev-parse --show-toplevel)
AUD="$(kubectl create token default --duration=10m \
  | jwt decode --json - \
  | jq -r '.payload.aud[0]')"

echo "Patching AuthPolicy with audience: $AUD"

kubectl patch authpolicy maas-api-auth-policy -n maas-api \
  --type='json' \
  -p "$(jq -nc --arg aud "$AUD" '[{
    op:"replace",
    path:"/spec/rules/authentication/openshift-identities/kubernetesTokenReview/audiences/0",
    value:$aud
  }]')"

Testing the Deployment

1. Get Gateway Endpoint

For OpenShift:

HOST="$(kubectl get gateway openshift-ai-inference -n openshift-ingress -o jsonpath='{.status.addresses[0].value}')"

For Kubernetes with LoadBalancer:

HOST="$(kubectl get gateway openshift-ai-inference -n openshift-ingress -o jsonpath='{.status.addresses[0].value}')"

2. Get Authentication Token

For OpenShift:

TOKEN_RESPONSE=$(curl -sSk \
  -H "Authorization: Bearer $(oc whoami -t)" \
  -H "Content-Type: application/json" \
  -X POST \
  -d '{"expiration": "10m"}' \
  "${HOST}/maas-api/v1/tokens")

TOKEN=$(echo $TOKEN_RESPONSE | jq -r .token)

3. Test Model Endpoints

For OpenShift deployments, first get the gateway route:

MODELS=$(curl ${HOST}/maas-api/v1/models  \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $TOKEN" | jq . -r)

echo $MODELS | jq .
MODEL_URL=$(echo $MODELS | jq -r '.data[0].url')
MODEL_NAME=$(echo $MODELS | jq -r '.data[0].id')

echo $MODEL_URL

4. Test Rate Limiting

Send multiple requests to trigger rate limit:

for i in {1..16}
do
curl -sSk -o /dev/null -w "%{http_code}\n" \
  -H "Authorization: Bearer $TOKEN" \
  -d "{
        \"model\": \"${MODEL_NAME}\",
        \"prompt\": \"Not really understood prompt\",
        \"max_prompts\": 40
    }" \
  "${MODEL_URL}/v1/chat/completions";
done

5. Verify Complete Deployment

Check that all components are running:

kubectl get pods -n maas-api
kubectl get pods -n kuadrant-system
kubectl get pods -n kserve
kubectl get pods -n llm

Check Gateway status:

kubectl get gateway -n openshift-ingress openshift-ai-inference

Check that policies are enforced:

kubectl get authpolicy -A
kubectl get tokenratelimitpolicy -A

# Check LLMInferenceServices are ready
kubectl get llminferenceservices -n llm

Services Exposed

After deployment, the following services are available:

OpenShift Access (with Rate Limiting)

Access models through the gateway route for proper token rate limiting:

MaaS API: https://maas-api.${CLUSTER_DOMAIN}
- For token generation and management
- Direct route to MaaS API service
Gateway (for Models): https://gateway.${CLUSTER_DOMAIN}
- Simulator: https://gateway.${CLUSTER_DOMAIN}/simulator/v1/chat/completions
- Qwen3: https://gateway.${CLUSTER_DOMAIN}/qwen3/v1/chat/completions
- All model access MUST go through the gateway for rate limiting

⚠️ IMPORTANT: Direct routes to models bypass TokenRateLimitPolicy. Always use the gateway route for production.

Troubleshooting

Check Component Status

Check all relevant pods:

kubectl get pods -A | grep -E "maas-api|kserve|kuadrant|simulator|qwen"

Check services:

kubectl get svc -A | grep -E "maas-api|simulator|qwen"

Check HTTPRoutes and Gateway:

kubectl get httproute -A
kubectl get gateway -A

View Logs

View MaaS API logs:

kubectl logs -n maas-api -l app=maas-api --tail=50

View Kuadrant logs:

kubectl logs -n kuadrant-system -l app=kuadrant --tail=50

View Model logs:

kubectl logs -n llm -l component=predictor --tail=50

Common Issues

OOMKilled during model download: Increase storage initializer memory limits
GPU models not scheduling: Ensure nodes have nvidia.com/gpu resources
Rate limiting not working: Verify AuthPolicy and TokenRateLimitPolicy are applied
Routes not accessible: Check Gateway status and HTTPRoute configuration
Kuadrant installation fails with CRD errors: The deployment script now automatically cleans up leftover CRDs from previous installations
TokenRateLimitPolicy MissingDependency error:
- Symptom: TokenRateLimitPolicy shows status "token rate limit policy validation has not finished"
- Fix: Run ./scripts/fix-token-rate-limit-policy.sh or manually restart:
```
kubectl rollout restart deployment kuadrant-operator-controller-manager -n kuadrant-system
kubectl rollout restart deployment/authorino -n kuadrant-system
```
- Note: This is a known Kuadrant issue that may occur after initial deployment
Gateway stuck in "Waiting for controller" on OpenShift:
- Symptom: Gateway shows "Waiting for controller" indefinitely
- Expected behavior: Creating the GatewayClass should automatically trigger Service Mesh installation
- If automatic installation doesn't work:
  1. Install Red Hat OpenShift Service Mesh operator from OperatorHub manually
  2. Create a Service Mesh control plane (Istio instance):
```
cat <<EOF | kubectl apply -f -
apiVersion: sailoperator.io/v1
kind: Istio
metadata:
  name: openshift-gateway
spec:
  version: v1.26.4
  namespace: openshift-ingress
EOF
```
- Note: This is typically only needed on non-RHOAI OpenShift clusters

Next Steps

After deploying the infrastructure:

Start the development environment: See the main README for frontend/backend setup
Deploy additional models: Check samples/models for more examples
Configure monitoring: Enable observability components in overlays

FilesExpand file tree

README.md

Latest commit

History