Skip to content

Commit 144c17f

Browse files
committed
Merge branch 'main' into jr_55467
2 parents f637c75 + fb42db2 commit 144c17f

File tree

23 files changed

+291
-73
lines changed

23 files changed

+291
-73
lines changed
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# CRD Annotations Reference
2+
3+
This page documents the standard annotations supported on MaaS custom resources.
4+
5+
## Common annotations (all CRDs)
6+
7+
These annotations are supported on **MaaSModelRef**, **MaaSAuthPolicy**, and **MaaSSubscription**. They follow OpenShift conventions and are recognized by the OpenShift console, `kubectl`, and other tooling.
8+
9+
| Annotation | Description | Example |
10+
| ---------- | ----------- | ------- |
11+
| `openshift.io/display-name` | Human-readable display name | `"Llama 2 7B Chat"` |
12+
| `openshift.io/description` | Free-text description of the resource | `"A general-purpose LLM for chat"` |
13+
14+
## MaaSModelRef annotations
15+
16+
In addition to the common annotations above, the MaaS API reads these annotations from **MaaSModelRef** and returns them in the `modelDetails` field of the `GET /v1/models` response.
17+
18+
| Annotation | Description | Returned in API | Example |
19+
| ---------- | ----------- | --------------- | ------- |
20+
| `openshift.io/display-name` | Human-readable model name | `modelDetails.displayName` | `"Llama 2 7B Chat"` |
21+
| `openshift.io/description` | Model description | `modelDetails.description` | `"A large language model optimized for chat"` |
22+
| `opendatahub.io/genai-use-case` | GenAI use case category | `modelDetails.genaiUseCase` | `"chat"` |
23+
| `opendatahub.io/context-window` | Context window size | `modelDetails.contextWindow` | `"4096"` |
24+
25+
### Example MaaSModelRef with annotations
26+
27+
```yaml
28+
apiVersion: maas.opendatahub.io/v1alpha1
29+
kind: MaaSModelRef
30+
metadata:
31+
name: llama-2-7b-chat
32+
namespace: opendatahub
33+
annotations:
34+
openshift.io/display-name: "Llama 2 7B Chat"
35+
openshift.io/description: "A large language model optimized for chat use cases"
36+
opendatahub.io/genai-use-case: "chat"
37+
opendatahub.io/context-window: "4096"
38+
spec:
39+
modelRef:
40+
kind: LLMInferenceService
41+
name: llama-2-7b-chat
42+
```
43+
44+
### API response
45+
46+
When annotations are set, the `GET /v1/models` response includes a `modelDetails` object:
47+
48+
```json
49+
{
50+
"id": "llama-2-7b-chat",
51+
"object": "model",
52+
"created": 1672531200,
53+
"owned_by": "opendatahub",
54+
"ready": true,
55+
"url": "https://...",
56+
"modelDetails": {
57+
"displayName": "Llama 2 7B Chat",
58+
"description": "A large language model optimized for chat use cases",
59+
"genaiUseCase": "chat",
60+
"contextWindow": "4096"
61+
}
62+
}
63+
```
64+
65+
When no annotations are set (or all values are empty), `modelDetails` is omitted from the response.
66+
67+
## MaaSAuthPolicy and MaaSSubscription annotations
68+
69+
The common annotations (`openshift.io/display-name`, `openshift.io/description`) can be set on MaaSAuthPolicy and MaaSSubscription resources for use by `kubectl`, the OpenShift console, and other tooling. They are **not** returned in the `GET /v1/models` API response.
70+
71+
### Example
72+
73+
```yaml
74+
apiVersion: maas.opendatahub.io/v1alpha1
75+
kind: MaaSAuthPolicy
76+
metadata:
77+
name: premium-access
78+
namespace: models-as-a-service
79+
annotations:
80+
openshift.io/display-name: "Premium Access Policy"
81+
openshift.io/description: "Grants premium-users group access to premium models"
82+
spec:
83+
# ...
84+
```

docs/content/configuration-and-management/maas-controller-overview.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ flowchart TB
5757

5858
**Summary:** You declare intent with MaaS CRs; the controller turns that into Gateway/Kuadrant resources that attach to the same HTTPRoute and backend (e.g. KServe LLMInferenceService).
5959

60-
The **MaaS API** GET /v1/models endpoint uses MaaSModelRef CRs as its primary source: it lists them in the API namespace, then **validates access** by probing each model’s `/v1/models` endpoint with the client’s **Authorization header** (passed through as-is). Only models that return 2xx or 405 are included. So the catalogue returned to the client is the set of MaaSModelRef objects the controller reconciles, filtered to those the client can actually access. No token exchange is performed; the header is forwarded as-is. (Once minting is in place, this may be revisited.)
60+
The **MaaS API** GET /v1/models endpoint uses MaaSModelRef CRs as its primary source: it reads them cluster-wide (all namespaces), then **validates access** by probing each model’s `/v1/models` endpoint with the client’s **Authorization header** (passed through as-is). Only models that return 2xx or 405 are included. So the catalogue returned to the client is the set of MaaSModelRef objects the controller reconciles, filtered to those the client can actually access. No token exchange is performed; the header is forwarded as-is.
6161

6262
---
6363

@@ -213,7 +213,7 @@ flowchart LR
213213
Deploy --> Examples
214214
```
215215

216-
- **Namespace**: Controller and default MaaS CRs live in **opendatahub** (configurable).
216+
- **Namespaces**: MaaS API and controller default to **opendatahub** (configurable). MaaSAuthPolicy and MaaSSubscription default to **models-as-a-service** (configurable). MaaSModelRef must live in the **same namespace** as the model it references (e.g. **llm**).
217217
- **Install**: `./scripts/deploy.sh` installs the full stack including the controller. Optionally run `./scripts/install-examples.sh` for sample MaaSModelRef, MaaSAuthPolicy, and MaaSSubscription.
218218

219219
---
@@ -233,7 +233,7 @@ The Kuadrant AuthPolicy validates API keys via the MaaS API and validates user t
233233
| Topic | Summary |
234234
|-------|---------|
235235
| **What** | MaaS Controller = control plane that reconciles MaaSModelRef, MaaSAuthPolicy, and MaaSSubscription into Gateway API and Kuadrant resources. |
236-
| **Where** | Single controller in `maas-controller`; CRs and generated resources can live in opendatahub or other namespaces. |
236+
| **Where** | Single controller in `opendatahub`; MaaSAuthPolicy / MaaSSubscription default to `models-as-a-service`; MaaSModelRef and generated Kuadrant policies target their model’s namespace. |
237237
| **How** | Three reconcilers watch MaaS CRs (and related resources); each creates/updates HTTPRoutes, AuthPolicies, or TokenRateLimitPolicies. |
238238
| **Identity bridge** | AuthPolicy exposes all user groups as a comma-separated `groups_str`; TokenRateLimitPolicy uses `groups_str.split(",").exists(...)` for subscription matching (the “string trick”). |
239239
| **Deploy** | Run `./scripts/deploy.sh`; optionally install examples. |

docs/content/configuration-and-management/maas-model-kinds.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ apiVersion: maas.opendatahub.io/v1alpha1
3131
kind: MaaSModelRef
3232
metadata:
3333
name: my-model
34-
namespace: opendatahub
34+
namespace: llm
3535
spec:
3636
modelRef:
3737
kind: LLMInferenceService
@@ -63,7 +63,7 @@ The controller:
6363

6464
## API Behavior
6565

66-
- The API reads MaaSModelRefs from the informer cache, maps each to an API model (`id`, `url`, `ready`, `kind`, etc.)
66+
- The API reads MaaSModelRefs cluster-wide, maps each to an API model (`id`, `url`, `ready`, `kind`, etc.)
6767
- **Access validation**: Probes each model's `/v1/models` endpoint with the request's Authorization header. Only models that return 2xx or 405 are included.
6868
- **Kind on the wire**: Each model in the GET /v1/models response carries a `kind` field from `spec.modelRef.kind`
6969

@@ -84,14 +84,14 @@ To support a new backend type (a new **kind** in `spec.modelRef`):
8484
- **Option B:** Extend the API's access-validation logic to branch on **kind** and use a kind-specific probe (different URL path or client), while keeping the same contract: include a model only if the probe with the user's token succeeds.
8585

8686
3. **Enrichment (optional)**
87-
- Extra metadata (e.g. display name) can be set by the controller in status or annotations and mapped into the model response. For a new kind, add a small branch in the MaaSModelRef → API model conversion if needed.
87+
- The API reads standard annotations from MaaSModelRef (`openshift.io/display-name`, `openshift.io/description`, `opendatahub.io/genai-use-case`, `opendatahub.io/context-window`) and returns them in the `modelDetails` field of the GET /v1/models response. See [CRD annotations](crd-annotations.md) for the full list. For a new kind, add a small branch in the MaaSModelRef → API model conversion if needed.
8888

8989
4. **RBAC**
9090
- If the new kind’s reconciler or the API needs to read another resource, add the corresponding **list/watch** (and optionally **get**) permissions to the maas-api ClusterRole and/or the controller’s RBAC.
9191

9292
## Summary
9393

9494
- **modelRef** is the backend reference (kind, name, optional namespace), analogous to [Gateway API BackendRef](https://gateway-api.sigs.k8s.io/reference/spec/#backendref).
95-
- **Listing:** Always from MaaSModelRef cache; no kind-specific listing logic.
95+
- **Listing:** Always from MaaSModelRef resources cluster-wide; no kind-specific listing logic.
9696
- **Access validation:** Same probe (GET endpoint with the request's Authorization header as-is) for all kinds unless kind-specific probes are added later.
9797
- **New kinds:** Implement in the controller (resolve referent, set status.endpoint and status.phase); extend the API only if the new kind cannot use the same probe path or needs different enrichment.

docs/content/configuration-and-management/model-listing-flow.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This document describes how the **GET /v1/models** endpoint discovers and returns the list of available models.
44

5-
The list is **based on MaaSModelRef** custom resources: the API returns models that are registered as MaaSModelRefs in its configured namespace.
5+
The list is **based on MaaSModelRef** custom resources: the API considers MaaSModelRef objects cluster-wide (all namespaces), then filters by access.
66

77
## Overview
88

@@ -17,15 +17,17 @@ Each entry includes an `id`, **`url`** (the model’s endpoint), a `ready` flag,
1717

1818
## MaaSModelRef flow
1919

20-
When the [MaaS controller](https://github.com/opendatahub-io/models-as-a-service/tree/main/maas-controller) is installed and the API is configured with a MaaSModelRef lister and namespace, the flow is:
20+
When the [MaaS controller](https://github.com/opendatahub-io/models-as-a-service/tree/main/maas-controller) is installed and the API is configured with a MaaSModelRef lister, the flow is:
2121

22-
1. The MaaS API lists all **MaaSModelRef** custom resources in its configured namespace (e.g. `opendatahub`). It reads them from an **in-memory cache** in the maas-api component (maintained by a Kubernetes informer), so it does not call the Kubernetes API on every request.
22+
1. The MaaS API discovers **MaaSModelRef** custom resources **cluster-wide** (all namespaces) without calling the Kubernetes API on every request.
2323

2424
2. For each MaaSModelRef, it reads **id** (`metadata.name`), **url** (`status.endpoint`), **ready** (`status.phase == "Ready"`), and related metadata. The controller has populated `status.endpoint` and `status.phase` from the underlying LLMInferenceService (for llmisvc) or HTTPRoute/Gateway.
2525

2626
3. **Access validation**: The API probes each model’s `/v1/models` endpoint with the **exact Authorization header** the client sent (passed through as-is). Only models that return **2xx**, **3xx** or **405** are included in the response. This ensures the list only shows models the client is authorized to use.
2727

28-
4. The filtered list is returned to the client.
28+
4. For each model, the API reads **annotations** from the MaaSModelRef to populate `modelDetails` in the response (display name, description, use case, context window). See [CRD annotations](crd-annotations.md) for the full list.
29+
30+
5. The filtered list is returned to the client.
2931

3032
```mermaid
3133
sequenceDiagram
@@ -53,7 +55,7 @@ sequenceDiagram
5355

5456
- **Consistent with gateway**: The same model names and routes are used for inference; the list matches what the gateway will accept for that client.
5557

56-
If the API is not configured with a MaaSModelRef lister and namespace, or if listing fails (e.g. CRD not installed, no RBAC, or server error), the API returns an empty list or an error.
58+
If the API is not configured with a MaaSModelRef lister, or if listing fails (e.g. CRD not installed, no RBAC, or server error), the API returns an empty list or an error.
5759

5860
## Subscription Filtering and Aggregation
5961

@@ -172,14 +174,18 @@ To have models appear via the **MaaSModelRef** flow:
172174
kind: MaaSModelRef
173175
metadata:
174176
name: my-model-name # This becomes the model "id" in GET /v1/models
175-
namespace: opendatahub
177+
namespace: llm # Same namespace as the LLMInferenceService
178+
annotations:
179+
openshift.io/display-name: "My Model" # optional: human-readable name
180+
openshift.io/description: "A general-purpose LLM" # optional: description
181+
opendatahub.io/genai-use-case: "chat" # optional: use case
182+
opendatahub.io/context-window: "4096" # optional: context window
176183
spec:
177184
modelRef:
178185
kind: LLMInferenceService
179186
name: my-llm-isvc-name
180-
namespace: llm
181187

182-
4. The controller reconciles the MaaSModelRef and sets `status.endpoint` and `status.phase`. The MaaS API (in the same namespace) will then include this model in GET /v1/models when it lists MaaSModelRef CRs.
188+
4. The controller reconciles the MaaSModelRef and sets `status.endpoint` and `status.phase`. The MaaS API will then include this model in GET /v1/models when it lists MaaSModelRef CRs.
183189

184190
You can use the [maas-system samples](https://github.com/opendatahub-io/models-as-a-service/tree/main/docs/samples/maas-system) as a template; the install script deploys LLMInferenceService + MaaSModelRef + MaaSAuthPolicy + MaaSSubscription together so dependencies resolve correctly.
185191

docs/content/configuration-and-management/model-setup.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,77 @@ spec:
122122

123123
### Complete example
124124

125+
Add the `alpha.maas.opendatahub.io/tiers` annotation to enable automatic RBAC setup for tier-based access:
126+
127+
```yaml
128+
apiVersion: serving.kserve.io/v1alpha1
129+
kind: LLMInferenceService
130+
metadata:
131+
name: my-production-model
132+
namespace: llm
133+
annotations:
134+
alpha.maas.opendatahub.io/tiers: '[]'
135+
spec:
136+
# ... rest of spec ...
137+
```
138+
139+
**Annotation Values:**
140+
141+
- **Empty list `[]`**: Grants access to **all tiers** (recommended for most models)
142+
- **List of tier names**: Grants access to specific tiers only
143+
- Example: `'["premium","enterprise"]'` - only premium and enterprise tiers can access
144+
- **Missing annotation**: **No tiers** have access by default (model won't be accessible via MaaS)
145+
146+
**Examples:**
147+
148+
Allow all tiers:
149+
150+
```yaml
151+
annotations:
152+
alpha.maas.opendatahub.io/tiers: '[]'
153+
```
154+
155+
Allow specific tiers:
156+
157+
```yaml
158+
annotations:
159+
alpha.maas.opendatahub.io/tiers: '["premium","enterprise"]'
160+
```
161+
162+
### Step 3: Add Display Metadata (Optional)
163+
164+
Add standard annotations to your **MaaSModelRef** to provide human-readable names and descriptions in the `GET /v1/models` API response:
165+
166+
```yaml
167+
apiVersion: maas.opendatahub.io/v1alpha1
168+
kind: MaaSModelRef
169+
metadata:
170+
name: my-production-model
171+
namespace: llm
172+
annotations:
173+
openshift.io/display-name: "My Production Model"
174+
openshift.io/description: "A fine-tuned model for production workloads"
175+
opendatahub.io/genai-use-case: "chat"
176+
opendatahub.io/context-window: "8192"
177+
spec:
178+
modelRef:
179+
kind: LLMInferenceService
180+
name: my-production-model
181+
```
182+
183+
These annotations are returned in the `modelDetails` field of the API response. All are optional. See [CRD annotations](crd-annotations.md) for the full list of supported annotations across all MaaS CRDs.
184+
185+
### What the Annotation Does
186+
187+
This annotation automatically creates the necessary RBAC resources (Roles and RoleBindings) that allow tier-specific service accounts to POST to your `LLMInferenceService`. The ODH Controller handles this automatically when the annotation is present.
188+
189+
Behind the scenes, it creates:
190+
191+
- **Role**: Grants `POST` permission on `llminferenceservices` resource
192+
- **RoleBinding**: Binds tier service account groups (e.g., `system:serviceaccounts:maas-default-gateway-tier-premium`) to the role
193+
194+
### Complete Example
195+
125196
Here's a complete example of an LLMInferenceService configured for MaaS:
126197

127198
```yaml

docs/content/install/model-setup.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -91,11 +91,11 @@ kubectl get pods -n llm
9191
**Validate MaaSModelRef status** — The MaaS controller populates `status.endpoint` and `status.phase` on each MaaSModelRef from the LLMInferenceService. The MaaSModelRef `status.endpoint` should match the URL exposed by the LLMInferenceService (via the gateway). Verify:
9292

9393
```bash
94-
# Check MaaSModelRef status (use opendatahub for ODH, redhat-ods-applications for RHOAI)
95-
kubectl get maasmodelref -n opendatahub -o wide
94+
# Check MaaSModelRef status (same namespace as the LLMInferenceService, e.g. llm)
95+
kubectl get maasmodelref -n llm -o wide
9696

9797
# Verify status.endpoint is populated and phase is Ready
98-
kubectl get maasmodelref -n opendatahub -o jsonpath='{range .items[*]}{.metadata.name}: phase={.status.phase} endpoint={.status.endpoint}{"\n"}{end}'
98+
kubectl get maasmodelref -n llm -o jsonpath='{range .items[*]}{.metadata.name}: phase={.status.phase} endpoint={.status.endpoint}{"\n"}{end}'
9999

100100
# Compare with LLMInferenceService — status.endpoint should match the URL from LLMIS status.addresses or status.url
101101
kubectl get llminferenceservice -n llm -o yaml | grep "url:"

docs/content/reference/crds/maas-model-ref.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# MaaSModelRef
22

3-
Identifies an AI/ML model on the cluster. The MaaS API lists models from MaaSModelRef resources (using `status.endpoint` and `status.phase`).
3+
Identifies an AI/ML model on the cluster. Create MaaSModelRef in the **same namespace** as the backend (`LLMInferenceService`, `ExternalModel`, etc.). The MaaS API lists models from MaaSModelRef resources cluster-wide (using `status.endpoint` and `status.phase`).
44

55
## MaaSModelRefSpec
66

docs/samples/maas-system/README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,9 @@ kustomize build docs/samples/maas-system/ | kubectl apply -f -
2525
# Or deploy a specific sample
2626
kustomize build docs/samples/maas-system/facebook-opt-125m-cpu/ | kubectl apply -f -
2727
kustomize build docs/samples/maas-system/qwen3/ | kubectl apply -f -
28-
```
2928

3029
# Verify
31-
kubectl get maasmodelref -n opendatahub
30+
kubectl get maasmodelref -n llm
3231
kubectl get maasauthpolicy,maassubscription -n models-as-a-service
3332
kubectl get llminferenceservice -n llm
3433
```
@@ -44,7 +43,7 @@ kubectl create namespace llm --dry-run=client -o yaml | kubectl apply -f -
4443
kustomize build docs/samples/maas-system | sed "s/namespace: models-as-a-service/namespace: my-namespace/g" | kubectl apply -f -
4544

4645
# Verify
47-
kubectl get maasmodelref -n opendatahub
46+
kubectl get maasmodelref -n llm
4847
kubectl get maasauthpolicy,maassubscription -n my-namespace
4948
kubectl get llminferenceservice -n llm
5049
```

docs/samples/maas-system/free/maas/maas-auth-policy.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@ kind: MaaSAuthPolicy
33
metadata:
44
name: simulator-access
55
namespace: models-as-a-service
6+
annotations:
7+
openshift.io/display-name: "Simulator Access (Free)"
8+
openshift.io/description: "Grants all authenticated users access to the free-tier simulator model"
69
spec:
710
modelRefs:
811
- name: facebook-opt-125m-simulated

docs/samples/maas-system/free/maas/maas-model.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@ kind: MaaSModelRef
33
metadata:
44
name: facebook-opt-125m-simulated
55
namespace: llm
6+
annotations:
7+
openshift.io/display-name: "Facebook OPT 125M (Simulated)"
8+
openshift.io/description: "A simulated OPT-125M model for free-tier testing"
69
spec:
710
modelRef:
811
kind: LLMInferenceService

0 commit comments

Comments
 (0)