You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Resolved conflicts by keeping jr_55467's security improvements:
- has() safety checks in CEL expressions
- userId-based cache keys (collision-resistant)
- Conditional apiKeyValidation with when clause
- string() function for proper JSON group serialization
Copy file name to clipboardExpand all lines: docs/content/configuration-and-management/maas-controller-overview.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,9 @@
2
2
3
3
This document describes the **MaaS Controller**: what was built, how it fits into the Models-as-a-Service (MaaS) stack, and how the pieces work together. It is intended for presentations, onboarding, and technical deep-dives.
Copy file name to clipboardExpand all lines: docs/content/configuration-and-management/maas-models.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,11 @@
2
2
3
3
MaaS uses **MaaSModelRef** to identify model servers that live on the cluster. Each MaaSModelRef is a reference to a model server—it holds the information MaaS needs to perform authentication, authorization, and rate limiting.
4
4
5
-
By using a single unified object (MaaSModelRef) for all model types, MaaS can handle different kinds of model servers—each with its own backend and lifecycle—through one consistent interface. The controller uses a **provider paradigm** to distinguish between types: each model type (for example, LLMInferenceService, external APIs) has a provider that knows how to reconcile and resolve that type. Today, vLLM (via LLMInferenceService) is the supported provider; additional providers may be added in the future.
5
+
By using a single unified object (MaaSModelRef) for all model types, MaaS can handle different kinds of model servers—each with its own backend and lifecycle—through one consistent interface. The controller uses a **provider paradigm** to distinguish between types: each model type (for example, LLMInferenceService, external APIs) has a provider that knows how to reconcile and resolve that type.
6
+
7
+
**Supported LLMs:** Most model families should work; an official validated list is in progress.
8
+
9
+
**Supported inference services:** vLLM through LLMInferenceService (KServe) is the initial supported release for on-cluster models; additional backends are planned for future releases.
Copy file name to clipboardExpand all lines: docs/content/configuration-and-management/model-setup.md
+16-2Lines changed: 16 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,9 +7,23 @@ This guide explains how to configure models so they appear in the MaaS platform
7
7
8
8
## Supported model types
9
9
10
-
MaaS is planning support for multiple model types through a **provider paradigm**: each MaaSModelRef references a model backend by `kind` (e.g., `LLMInferenceService`, `ExternalModel`). The controller uses provider-specific logic to reconcile and resolve each type.
10
+
MaaS distinguishes between **supported LLMs** (the model weights/architectures) and **supported inference services** (the runtime backends).
11
11
12
-
**LLMInferenceService** will be initially supported. The initial release focuses on using KServe for on-cluster models. This guide describes the configuration differences between the default LLMInferenceService and the MaaS-enabled one to help users understand the differences.
12
+
### Supported LLMs
13
+
14
+
Most LLM model families should work (e.g., Llama, Mistral, Qwen, GPT-style models). We are working on an official validated list. If you encounter issues with a specific model, please report them.
15
+
16
+
### Supported inference services
17
+
18
+
MaaS uses a **provider paradigm**: each MaaSModelRef references a model backend by `kind` (e.g., `LLMInferenceService`, `ExternalModel`). The controller uses provider-specific logic to reconcile and resolve each type. Supported inference runtimes include:
19
+
20
+
| Inference service | Status |
21
+
|-------------------|--------|
22
+
|**vLLM** (via LLMInferenceService / KServe) | Initial supported release. This is the primary supported backend for on-cluster models. |
23
+
|**KServe** (LLMInferenceService) | Runtime framework. vLLM workloads run through LLMInferenceService. |
24
+
|**Additional backends**| Planned for future releases. |
25
+
26
+
This guide describes the configuration differences between the default LLMInferenceService and the MaaS-enabled one to help users understand the differences.
Use `NAMESPACE=redhat-ods-applications` for RHOAI. The full `scripts/deploy.sh` script also creates PostgreSQL automatically when deploying MaaS.
35
+
**Setting the namespace:** The script defaults to `opendatahub`. Set the `NAMESPACE` environment variable if your MaaS deployment uses a different namespace:
The full `scripts/deploy.sh` script also creates PostgreSQL automatically when deploying MaaS.
36
46
37
47
!!! note "Restarting maas-api"
38
48
If you add or update the Secret after the DataScienceCluster already has modelsAsService in managed state, restart the maas-api deployment to pick up the config:
@@ -54,6 +64,20 @@ The Gateway must exist before enabling modelsAsService in your DataScienceCluste
54
64
./scripts/setup-authorino-tls.sh
55
65
```
56
66
67
+
**Setting the namespace:** The script defaults to `kuadrant-system` (ODH with Kuadrant). Set `AUTHORINO_NAMESPACE` for RHOAI, which uses RHCL:
The Gateway **must** include these annotations for MaaS to work correctly:
75
+
76
+
| Annotation | Purpose |
77
+
|------------|---------|
78
+
| `opendatahub.io/managed: "false"` | Read by **maas-controller**: allows it to manage AuthPolicies and related resources; prevents the ODH Model Controller from overwriting them. |
79
+
| `security.opendatahub.io/authorino-tls-bootstrap: "true"` | Used by the ODH platform (not maas-controller) to create the EnvoyFilter for Gateway → Authorino TLS when Authorino uses a TLS listener. Required when Authorino TLS is enabled (see [TLS Configuration](../configuration-and-management/tls-configuration.md)). |
80
+
57
81
```yaml
58
82
CLUSTER_DOMAIN=$(kubectl get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
59
83
# Use default ingress cert for HTTPS, or set CERT_NAME to your TLS secret name
Copy file name to clipboardExpand all lines: docs/content/install/model-setup.md
+70-1Lines changed: 70 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,12 +14,81 @@ Our sample models are packaged as Kustomize overlays that deploy:
14
14
For more detail on each resource, see [Access and Quota Overview](../configuration-and-management/subscription-overview.md).
15
15
16
16
!!! tip "Create llm namespace (optional)"
17
-
Models deploy to the `llm` namespace. If it does not exist, create it first (idempotent—safe to run even if it already exists):
17
+
Our example models deploy to the `llm` namespace. If it does not exist, create it before deploying the samples below (idempotent—safe to run even if it already exists):
Deploying a model through MaaS follows a specific order. Each resource depends on the previous one. The following walkthrough deploys the **simulator model** step by step so you can see what each resource does.
26
+
27
+
Set the project root (run from the repository root):
28
+
29
+
```bash
30
+
PROJECT_DIR=$(git rev-parse --show-toplevel)
31
+
```
32
+
33
+
### Step 1: Deploy the LLMInferenceService (Model)
34
+
35
+
The LLMInferenceService is the actual inference workload. It must exist first and use the `maas-default-gateway` gateway reference so traffic flows through MaaS for authentication and rate limiting.
This deploys the simulator workload (a lightweight mock that generates responses without a real LLM). The resource is named `facebook-opt-125m-simulated` in the `llm` namespace. Verify it is ready:
42
+
43
+
```bash
44
+
kubectl get llminferenceservice -n llm
45
+
kubectl get pods -n llm
46
+
```
47
+
48
+
### Step 2: Deploy the MaaSModelRef
49
+
50
+
The MaaSModelRef registers the model with MaaS so it appears in the catalog and the `/v1/models` API. It references the LLMInferenceService by name. The maas-controller watches MaaSModelRefs and populates `status.endpoint` and `status.phase` from the underlying LLMInferenceService.
After a short moment, the controller reconciles. Verify status is populated:
57
+
58
+
```bash
59
+
kubectl get maasmodelref -n llm facebook-opt-125m-simulated -o jsonpath='{.status.phase}'&&echo
60
+
kubectl get maasmodelref -n llm facebook-opt-125m-simulated -o jsonpath='{.status.endpoint}'&&echo
61
+
```
62
+
63
+
**Expected output:**`status.phase` should be `Ready` and `status.endpoint` should be a non-empty URL. If either is missing, wait briefly and retry—the controller may still be reconciling (see [Verify Model Deployment](#verify-model-deployment) below).
64
+
65
+
### Step 3: Deploy the MaaSSubscription
66
+
67
+
The MaaSSubscription defines token rate limits (quotas) for groups. It references the MaaSModelRef by name and namespace. This controls how many tokens each group can consume per model.
68
+
69
+
Create the `models-as-a-service` namespace if it does not exist, then apply:
This sample grants `system:authenticated` (all authenticated users) a limit of 100 tokens per minute for the simulator model.
77
+
78
+
### Step 4: Deploy the MaaSAuthPolicy
79
+
80
+
The MaaSAuthPolicy defines who can access the model. It references the MaaSModelRef by name and namespace. Without this, requests to the model are denied even if the user has a subscription.
This sample grants access to `system:authenticated`. The maas-controller creates per-model AuthPolicies and TokenRateLimitPolicies that enforce this.
87
+
88
+
---
89
+
90
+
You have now deployed the full simulator stack manually. The sections below deploy all required objects (Model, ModelRef, Subscription, AuthPolicy) together using a single Kustomize command for each sample.
0 commit comments