You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This page documents the standard annotations supported on MaaS custom resources.
4
+
5
+
## Common annotations (all CRDs)
6
+
7
+
These annotations are supported on **MaaSModelRef**, **MaaSAuthPolicy**, and **MaaSSubscription**. They follow OpenShift conventions and are recognized by the OpenShift console, `kubectl`, and other tooling.
8
+
9
+
| Annotation | Description | Example |
10
+
| ---------- | ----------- | ------- |
11
+
|`openshift.io/display-name`| Human-readable display name |`"Llama 2 7B Chat"`|
12
+
|`openshift.io/description`| Free-text description of the resource |`"A general-purpose LLM for chat"`|
13
+
14
+
## MaaSModelRef annotations
15
+
16
+
In addition to the common annotations above, the MaaS API reads these annotations from **MaaSModelRef** and returns them in the `modelDetails` field of the `GET /v1/models` response.
17
+
18
+
| Annotation | Description | Returned in API | Example |
openshift.io/description: "A large language model optimized for chat use cases"
36
+
opendatahub.io/genai-use-case: "chat"
37
+
opendatahub.io/context-window: "4096"
38
+
spec:
39
+
modelRef:
40
+
kind: LLMInferenceService
41
+
name: llama-2-7b-chat
42
+
```
43
+
44
+
### API response
45
+
46
+
When annotations are set, the `GET /v1/models` response includes a `modelDetails` object:
47
+
48
+
```json
49
+
{
50
+
"id": "llama-2-7b-chat",
51
+
"object": "model",
52
+
"created": 1672531200,
53
+
"owned_by": "opendatahub",
54
+
"ready": true,
55
+
"url": "https://...",
56
+
"modelDetails": {
57
+
"displayName": "Llama 2 7B Chat",
58
+
"description": "A large language model optimized for chat use cases",
59
+
"genaiUseCase": "chat",
60
+
"contextWindow": "4096"
61
+
}
62
+
}
63
+
```
64
+
65
+
When no annotations are set (or all values are empty), `modelDetails` is omitted from the response.
66
+
67
+
## MaaSAuthPolicy and MaaSSubscription annotations
68
+
69
+
The common annotations (`openshift.io/display-name`, `openshift.io/description`) can be set on MaaSAuthPolicy and MaaSSubscription resources for use by `kubectl`, the OpenShift console, and other tooling. They are **not** returned in the `GET /v1/models` API response.
Copy file name to clipboardExpand all lines: docs/content/configuration-and-management/maas-controller-overview.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,7 +57,7 @@ flowchart TB
57
57
58
58
**Summary:** You declare intent with MaaS CRs; the controller turns that into Gateway/Kuadrant resources that attach to the same HTTPRoute and backend (e.g. KServe LLMInferenceService).
59
59
60
-
The **MaaS API** GET /v1/models endpoint uses MaaSModelRef CRs as its primary source: it lists them in the API namespace, then **validates access** by probing each model’s `/v1/models` endpoint with the client’s **Authorization header** (passed through as-is). Only models that return 2xx or 405 are included. So the catalogue returned to the client is the set of MaaSModelRef objects the controller reconciles, filtered to those the client can actually access. No token exchange is performed; the header is forwarded as-is. (Once minting is in place, this may be revisited.)
60
+
The **MaaS API** GET /v1/models endpoint uses MaaSModelRef CRs as its primary source: it reads them cluster-wide (all namespaces), then **validates access** by probing each model’s `/v1/models` endpoint with the client’s **Authorization header** (passed through as-is). Only models that return 2xx or 405 are included. So the catalogue returned to the client is the set of MaaSModelRef objects the controller reconciles, filtered to those the client can actually access. No token exchange is performed; the header is forwarded as-is.
61
61
62
62
---
63
63
@@ -213,7 +213,7 @@ flowchart LR
213
213
Deploy --> Examples
214
214
```
215
215
216
-
-**Namespace**: Controller and default MaaS CRs live in **opendatahub**(configurable).
216
+
-**Namespaces**: MaaS API and controller default to **opendatahub** (configurable). MaaSAuthPolicy and MaaSSubscription default to **models-as-a-service** (configurable). MaaSModelRef must live in the **same namespace**as the model it references (e.g. **llm**).
217
217
-**Install**: `./scripts/deploy.sh` installs the full stack including the controller. Optionally run `./scripts/install-examples.sh` for sample MaaSModelRef, MaaSAuthPolicy, and MaaSSubscription.
218
218
219
219
---
@@ -233,7 +233,7 @@ The Kuadrant AuthPolicy validates API keys via the MaaS API and validates user t
233
233
| Topic | Summary |
234
234
|-------|---------|
235
235
|**What**| MaaS Controller = control plane that reconciles MaaSModelRef, MaaSAuthPolicy, and MaaSSubscription into Gateway API and Kuadrant resources. |
236
-
|**Where**| Single controller in `maas-controller`; CRs and generated resources can live in opendatahub or other namespaces. |
236
+
|**Where**| Single controller in `opendatahub`; MaaSAuthPolicy / MaaSSubscription default to `models-as-a-service`; MaaSModelRef and generated Kuadrant policies target their model’s namespace. |
237
237
|**How**| Three reconcilers watch MaaS CRs (and related resources); each creates/updates HTTPRoutes, AuthPolicies, or TokenRateLimitPolicies. |
238
238
|**Identity bridge**| AuthPolicy exposes all user groups as a comma-separated `groups_str`; TokenRateLimitPolicy uses `groups_str.split(",").exists(...)` for subscription matching (the “string trick”). |
239
239
|**Deploy**| Run `./scripts/deploy.sh`; optionally install examples. |
- The API reads MaaSModelRefs from the informer cache, maps each to an API model (`id`, `url`, `ready`, `kind`, etc.)
66
+
- The API reads MaaSModelRefs cluster-wide, maps each to an API model (`id`, `url`, `ready`, `kind`, etc.)
67
67
- **Access validation**: Probes each model's `/v1/models` endpoint with the request's Authorization header. Only models that return 2xx or 405 are included.
68
68
- **Kind on the wire**: Each model in the GET /v1/models response carries a `kind` field from `spec.modelRef.kind`
69
69
@@ -84,14 +84,14 @@ To support a new backend type (a new **kind** in `spec.modelRef`):
84
84
- **Option B:** Extend the API's access-validation logic to branch on **kind** and use a kind-specific probe (different URL path or client), while keeping the same contract: include a model only if the probe with the user's token succeeds.
85
85
86
86
3. **Enrichment (optional)**
87
-
- Extra metadata (e.g. display name) can be set by the controller in status or annotations and mapped into the model response. For a new kind, add a small branch in the MaaSModelRef → API model conversion if needed.
87
+
- The API reads standard annotations from MaaSModelRef (`openshift.io/display-name`, `openshift.io/description`, `opendatahub.io/genai-use-case`, `opendatahub.io/context-window`) and returns them in the `modelDetails` field of the GET /v1/models response. See [CRD annotations](crd-annotations.md) for the full list. For a new kind, add a small branch in the MaaSModelRef → API model conversion if needed.
88
88
89
89
4. **RBAC**
90
90
- If the new kind’s reconciler or the API needs to read another resource, add the corresponding **list/watch** (and optionally **get**) permissions to the maas-api ClusterRole and/or the controller’s RBAC.
91
91
92
92
## Summary
93
93
94
94
- **modelRef** is the backend reference (kind, name, optional namespace), analogous to [Gateway API BackendRef](https://gateway-api.sigs.k8s.io/reference/spec/#backendref).
95
-
- **Listing:** Always from MaaSModelRef cache; no kind-specific listing logic.
95
+
- **Listing:** Always from MaaSModelRef resources cluster-wide; no kind-specific listing logic.
96
96
- **Access validation:** Same probe (GET endpoint with the request's Authorization header as-is) for all kinds unless kind-specific probes are added later.
97
97
- **New kinds:** Implement in the controller (resolve referent, set status.endpoint and status.phase); extend the API only if the new kind cannot use the same probe path or needs different enrichment.
Copy file name to clipboardExpand all lines: docs/content/configuration-and-management/model-listing-flow.md
+14-8Lines changed: 14 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
This document describes how the **GET /v1/models** endpoint discovers and returns the list of available models.
4
4
5
-
The list is **based on MaaSModelRef** custom resources: the API returns models that are registered as MaaSModelRefs in its configured namespace.
5
+
The list is **based on MaaSModelRef** custom resources: the API considers MaaSModelRef objects cluster-wide (all namespaces), then filters by access.
6
6
7
7
## Overview
8
8
@@ -17,15 +17,17 @@ Each entry includes an `id`, **`url`** (the model’s endpoint), a `ready` flag,
17
17
18
18
## MaaSModelRef flow
19
19
20
-
When the [MaaS controller](https://github.com/opendatahub-io/models-as-a-service/tree/main/maas-controller) is installed and the API is configured with a MaaSModelRef lister and namespace, the flow is:
20
+
When the [MaaS controller](https://github.com/opendatahub-io/models-as-a-service/tree/main/maas-controller) is installed and the API is configured with a MaaSModelRef lister, the flow is:
21
21
22
-
1. The MaaS API lists all **MaaSModelRef** custom resources in its configured namespace (e.g. `opendatahub`). It reads them from an **in-memory cache**in the maas-api component (maintained by a Kubernetes informer), so it does not call the Kubernetes API on every request.
22
+
1. The MaaS API discovers **MaaSModelRef** custom resources **cluster-wide**(all namespaces) without calling the Kubernetes API on every request.
23
23
24
24
2. For each MaaSModelRef, it reads **id** (`metadata.name`), **url** (`status.endpoint`), **ready** (`status.phase == "Ready"`), and related metadata. The controller has populated `status.endpoint` and `status.phase` from the underlying LLMInferenceService (for llmisvc) or HTTPRoute/Gateway.
25
25
26
26
3.**Access validation**: The API probes each model’s `/v1/models` endpoint with the **exact Authorization header** the client sent (passed through as-is). Only models that return **2xx**, **3xx** or **405** are included in the response. This ensures the list only shows models the client is authorized to use.
27
27
28
-
4. The filtered list is returned to the client.
28
+
4. For each model, the API reads **annotations** from the MaaSModelRef to populate `modelDetails` in the response (display name, description, use case, context window). See [CRD annotations](crd-annotations.md) for the full list.
29
+
30
+
5. The filtered list is returned to the client.
29
31
30
32
```mermaid
31
33
sequenceDiagram
@@ -53,7 +55,7 @@ sequenceDiagram
53
55
54
56
-**Consistent with gateway**: The same model names and routes are used for inference; the list matches what the gateway will accept for that client.
55
57
56
-
If the API is not configured with a MaaSModelRef lister and namespace, or if listing fails (e.g. CRD not installed, no RBAC, or server error), the API returns an empty list or an error.
58
+
If the API is not configured with a MaaSModelRef lister, or if listing fails (e.g. CRD not installed, no RBAC, or server error), the API returns an empty list or an error.
57
59
58
60
## Subscription Filtering and Aggregation
59
61
@@ -172,14 +174,18 @@ To have models appear via the **MaaSModelRef** flow:
172
174
kind: MaaSModelRef
173
175
metadata:
174
176
name: my-model-name # This becomes the model "id" in GET /v1/models
175
-
namespace: opendatahub
177
+
namespace: llm # Same namespace as the LLMInferenceService
178
+
annotations:
179
+
openshift.io/display-name: "My Model" # optional: human-readable name
180
+
openshift.io/description: "A general-purpose LLM" # optional: description
181
+
opendatahub.io/genai-use-case: "chat" # optional: use case
4. The controller reconciles the MaaSModelRef and sets `status.endpoint` and `status.phase`. The MaaS API (in the same namespace) will then include this model in GET /v1/models when it lists MaaSModelRef CRs.
188
+
4. The controller reconciles the MaaSModelRef and sets `status.endpoint` and `status.phase`. The MaaS API will then include this model in GET /v1/models when it lists MaaSModelRef CRs.
183
189
184
190
You can use the [maas-system samples](https://github.com/opendatahub-io/models-as-a-service/tree/main/docs/samples/maas-system) as a template; the install script deploys LLMInferenceService + MaaSModelRef + MaaSAuthPolicy + MaaSSubscription together so dependencies resolve correctly.
Add standard annotations to your **MaaSModelRef** to provide human-readable names and descriptions in the `GET /v1/models` API response:
165
+
166
+
```yaml
167
+
apiVersion: maas.opendatahub.io/v1alpha1
168
+
kind: MaaSModelRef
169
+
metadata:
170
+
name: my-production-model
171
+
namespace: llm
172
+
annotations:
173
+
openshift.io/display-name: "My Production Model"
174
+
openshift.io/description: "A fine-tuned model for production workloads"
175
+
opendatahub.io/genai-use-case: "chat"
176
+
opendatahub.io/context-window: "8192"
177
+
spec:
178
+
modelRef:
179
+
kind: LLMInferenceService
180
+
name: my-production-model
181
+
```
182
+
183
+
These annotations are returned in the `modelDetails` field of the API response. All are optional. See [CRD annotations](crd-annotations.md) for the full list of supported annotations across all MaaS CRDs.
184
+
185
+
### What the Annotation Does
186
+
187
+
This annotation automatically creates the necessary RBAC resources (Roles and RoleBindings) that allow tier-specific service accounts to POST to your `LLMInferenceService`. The ODH Controller handles this automatically when the annotation is present.
188
+
189
+
Behind the scenes, it creates:
190
+
191
+
- **Role**: Grants `POST` permission on `llminferenceservices` resource
192
+
- **RoleBinding**: Binds tier service account groups (e.g., `system:serviceaccounts:maas-default-gateway-tier-premium`) to the role
193
+
194
+
### Complete Example
195
+
125
196
Here's a complete example of an LLMInferenceService configured for MaaS:
Copy file name to clipboardExpand all lines: docs/content/install/model-setup.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -91,11 +91,11 @@ kubectl get pods -n llm
91
91
**Validate MaaSModelRef status** — The MaaS controller populates `status.endpoint` and `status.phase` on each MaaSModelRef from the LLMInferenceService. The MaaSModelRef `status.endpoint` should match the URL exposed by the LLMInferenceService (via the gateway). Verify:
92
92
93
93
```bash
94
-
# Check MaaSModelRef status (use opendatahub for ODH, redhat-ods-applications for RHOAI)
95
-
kubectl get maasmodelref -n opendatahub -o wide
94
+
# Check MaaSModelRef status (same namespace as the LLMInferenceService, e.g. llm)
95
+
kubectl get maasmodelref -n llm -o wide
96
96
97
97
# Verify status.endpoint is populated and phase is Ready
98
-
kubectl get maasmodelref -n opendatahub -o jsonpath='{range .items[*]}{.metadata.name}: phase={.status.phase} endpoint={.status.endpoint}{"\n"}{end}'
98
+
kubectl get maasmodelref -n llm -o jsonpath='{range .items[*]}{.metadata.name}: phase={.status.phase} endpoint={.status.endpoint}{"\n"}{end}'
99
99
100
100
# Compare with LLMInferenceService — status.endpoint should match the URL from LLMIS status.addresses or status.url
101
101
kubectl get llminferenceservice -n llm -o yaml | grep "url:"
Copy file name to clipboardExpand all lines: docs/content/reference/crds/maas-model-ref.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# MaaSModelRef
2
2
3
-
Identifies an AI/ML model on the cluster. The MaaS API lists models from MaaSModelRef resources (using `status.endpoint` and `status.phase`).
3
+
Identifies an AI/ML model on the cluster. Create MaaSModelRef in the **same namespace** as the backend (`LLMInferenceService`, `ExternalModel`, etc.). The MaaS API lists models from MaaSModelRef resources cluster-wide (using `status.endpoint` and `status.phase`).
0 commit comments