opendatahub-io
diff --git a/‎docs/content/configuration-and-management/crd-annotations.md‎
Lines changed: 84 additions & 0 deletions b/‎docs/content/configuration-and-management/crd-annotations.md‎
Lines changed: 84 additions & 0 deletions
diff --git a/‎docs/content/configuration-and-management/maas-controller-overview.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/content/configuration-and-management/maas-controller-overview.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/content/configuration-and-management/maas-model-kinds.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/content/configuration-and-management/maas-model-kinds.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/content/configuration-and-management/model-listing-flow.md‎
Lines changed: 14 additions & 8 deletions b/‎docs/content/configuration-and-management/model-listing-flow.md‎
Lines changed: 14 additions & 8 deletions
diff --git a/‎docs/content/configuration-and-management/model-setup.md‎
Lines changed: 71 additions & 0 deletions b/‎docs/content/configuration-and-management/model-setup.md‎
Lines changed: 71 additions & 0 deletions
diff --git a/‎docs/content/install/model-setup.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/content/install/model-setup.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/content/reference/crds/maas-model-ref.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/content/reference/crds/maas-model-ref.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/samples/maas-system/README.md‎
Lines changed: 2 additions & 3 deletions b/‎docs/samples/maas-system/README.md‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎docs/samples/maas-system/free/maas/maas-auth-policy.yaml‎
Lines changed: 3 additions & 0 deletions b/‎docs/samples/maas-system/free/maas/maas-auth-policy.yaml‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/samples/maas-system/free/maas/maas-model.yaml‎
Lines changed: 3 additions & 0 deletions b/‎docs/samples/maas-system/free/maas/maas-model.yaml‎
Lines changed: 3 additions & 0 deletions
@@ -0,0 +1,84 @@
+# CRD Annotations Reference
+
+This page documents the standard annotations supported on MaaS custom resources.
+
+## Common annotations (all CRDs)
+
+These annotations are supported on **MaaSModelRef**, **MaaSAuthPolicy**, and **MaaSSubscription**. They follow OpenShift conventions and are recognized by the OpenShift console, `kubectl`, and other tooling.
+
+| Annotation | Description | Example |
+| ---------- | ----------- | ------- |
+| `openshift.io/display-name` | Human-readable display name | `"Llama 2 7B Chat"` |
+| `openshift.io/description` | Free-text description of the resource | `"A general-purpose LLM for chat"` |
+
+## MaaSModelRef annotations
+
+In addition to the common annotations above, the MaaS API reads these annotations from **MaaSModelRef** and returns them in the `modelDetails` field of the `GET /v1/models` response.
+
+| Annotation | Description | Returned in API | Example |
+| ---------- | ----------- | --------------- | ------- |
+| `openshift.io/display-name` | Human-readable model name | `modelDetails.displayName` | `"Llama 2 7B Chat"` |
+| `openshift.io/description` | Model description | `modelDetails.description` | `"A large language model optimized for chat"` |
+| `opendatahub.io/genai-use-case` | GenAI use case category | `modelDetails.genaiUseCase` | `"chat"` |
+| `opendatahub.io/context-window` | Context window size | `modelDetails.contextWindow` | `"4096"` |
+
+### Example MaaSModelRef with annotations
+
+```yaml
+apiVersion: maas.opendatahub.io/v1alpha1
+kind: MaaSModelRef
+metadata:
+  name: llama-2-7b-chat
+  namespace: opendatahub
+  annotations:
+    openshift.io/display-name: "Llama 2 7B Chat"
+    openshift.io/description: "A large language model optimized for chat use cases"
+    opendatahub.io/genai-use-case: "chat"
+    opendatahub.io/context-window: "4096"
+spec:
+  modelRef:
+    kind: LLMInferenceService
+    name: llama-2-7b-chat
+```
+
+### API response
+
+When annotations are set, the `GET /v1/models` response includes a `modelDetails` object:
+
+```json
+{
+  "id": "llama-2-7b-chat",
+  "object": "model",
+  "created": 1672531200,
+  "owned_by": "opendatahub",
+  "ready": true,
+  "url": "https://...",
+  "modelDetails": {
+    "displayName": "Llama 2 7B Chat",
+    "description": "A large language model optimized for chat use cases",
+    "genaiUseCase": "chat",
+    "contextWindow": "4096"
+  }
+}
+```
+
+When no annotations are set (or all values are empty), `modelDetails` is omitted from the response.
+
+## MaaSAuthPolicy and MaaSSubscription annotations
+
+The common annotations (`openshift.io/display-name`, `openshift.io/description`) can be set on MaaSAuthPolicy and MaaSSubscription resources for use by `kubectl`, the OpenShift console, and other tooling. They are **not** returned in the `GET /v1/models` API response.
+
+### Example
+
+```yaml
+apiVersion: maas.opendatahub.io/v1alpha1
+kind: MaaSAuthPolicy
+metadata:
+  name: premium-access
+  namespace: models-as-a-service
+  annotations:
+    openshift.io/display-name: "Premium Access Policy"
+    openshift.io/description: "Grants premium-users group access to premium models"
+spec:
+  # ...
+```
@@ -57,7 +57,7 @@ flowchart TB
 
 **Summary:** You declare intent with MaaS CRs; the controller turns that into Gateway/Kuadrant resources that attach to the same HTTPRoute and backend (e.g. KServe LLMInferenceService).
 
-The **MaaS API** GET /v1/models endpoint uses MaaSModelRef CRs as its primary source: it lists them in the API namespace, then **validates access** by probing each model’s `/v1/models` endpoint with the client’s **Authorization header** (passed through as-is). Only models that return 2xx or 405 are included. So the catalogue returned to the client is the set of MaaSModelRef objects the controller reconciles, filtered to those the client can actually access. No token exchange is performed; the header is forwarded as-is. (Once minting is in place, this may be revisited.)
+The **MaaS API** GET /v1/models endpoint uses MaaSModelRef CRs as its primary source: it reads them cluster-wide (all namespaces), then **validates access** by probing each model’s `/v1/models` endpoint with the client’s **Authorization header** (passed through as-is). Only models that return 2xx or 405 are included. So the catalogue returned to the client is the set of MaaSModelRef objects the controller reconciles, filtered to those the client can actually access. No token exchange is performed; the header is forwarded as-is.
 
 ---
 
@@ -213,7 +213,7 @@ flowchart LR
     Deploy --> Examples
 ```
 
-- **Namespace**: Controller and default MaaS CRs live in **opendatahub** (configurable).
+- **Namespaces**: MaaS API and controller default to **opendatahub** (configurable). MaaSAuthPolicy and MaaSSubscription default to **models-as-a-service** (configurable). MaaSModelRef must live in the **same namespace** as the model it references (e.g. **llm**).
 - **Install**: `./scripts/deploy.sh` installs the full stack including the controller. Optionally run `./scripts/install-examples.sh` for sample MaaSModelRef, MaaSAuthPolicy, and MaaSSubscription.
 
 ---
@@ -233,7 +233,7 @@ The Kuadrant AuthPolicy validates API keys via the MaaS API and validates user t
 | Topic | Summary |
 |-------|---------|
 | **What** | MaaS Controller = control plane that reconciles MaaSModelRef, MaaSAuthPolicy, and MaaSSubscription into Gateway API and Kuadrant resources. |
-| **Where** | Single controller in `maas-controller`; CRs and generated resources can live in opendatahub or other namespaces. |
+| **Where** | Single controller in `opendatahub`; MaaSAuthPolicy / MaaSSubscription default to `models-as-a-service`; MaaSModelRef and generated Kuadrant policies target their model’s namespace. |
 | **How** | Three reconcilers watch MaaS CRs (and related resources); each creates/updates HTTPRoutes, AuthPolicies, or TokenRateLimitPolicies. |
 | **Identity bridge** | AuthPolicy exposes all user groups as a comma-separated `groups_str`; TokenRateLimitPolicy uses `groups_str.split(",").exists(...)` for subscription matching (the “string trick”). |
 | **Deploy** | Run `./scripts/deploy.sh`; optionally install examples. |
 
@@ -31,7 +31,7 @@ apiVersion: maas.opendatahub.io/v1alpha1
 kind: MaaSModelRef
 metadata:
   name: my-model
-  namespace: opendatahub
+  namespace: llm
 spec:
   modelRef:
     kind: LLMInferenceService
@@ -63,7 +63,7 @@ The controller:
 
 ## API Behavior
 
-- The API reads MaaSModelRefs from the informer cache, maps each to an API model (`id`, `url`, `ready`, `kind`, etc.)
+- The API reads MaaSModelRefs cluster-wide, maps each to an API model (`id`, `url`, `ready`, `kind`, etc.)
 - **Access validation**: Probes each model's `/v1/models` endpoint with the request's Authorization header. Only models that return 2xx or 405 are included.
 - **Kind on the wire**: Each model in the GET /v1/models response carries a `kind` field from `spec.modelRef.kind`
 
@@ -84,14 +84,14 @@ To support a new backend type (a new **kind** in `spec.modelRef`):
      - **Option B:** Extend the API's access-validation logic to branch on **kind** and use a kind-specific probe (different URL path or client), while keeping the same contract: include a model only if the probe with the user's token succeeds.
 
 3. **Enrichment (optional)**
-   - Extra metadata (e.g. display name) can be set by the controller in status or annotations and mapped into the model response. For a new kind, add a small branch in the MaaSModelRef → API model conversion if needed.
+   - The API reads standard annotations from MaaSModelRef (`openshift.io/display-name`, `openshift.io/description`, `opendatahub.io/genai-use-case`, `opendatahub.io/context-window`) and returns them in the `modelDetails` field of the GET /v1/models response. See [CRD annotations](crd-annotations.md) for the full list. For a new kind, add a small branch in the MaaSModelRef → API model conversion if needed.
 
 4. **RBAC**
    - If the new kind’s reconciler or the API needs to read another resource, add the corresponding **list/watch** (and optionally **get**) permissions to the maas-api ClusterRole and/or the controller’s RBAC.
 
 ## Summary
 
 - **modelRef** is the backend reference (kind, name, optional namespace), analogous to [Gateway API BackendRef](https://gateway-api.sigs.k8s.io/reference/spec/#backendref).
-- **Listing:** Always from MaaSModelRef cache; no kind-specific listing logic.
+- **Listing:** Always from MaaSModelRef resources cluster-wide; no kind-specific listing logic.
 - **Access validation:** Same probe (GET endpoint with the request's Authorization header as-is) for all kinds unless kind-specific probes are added later.
 - **New kinds:** Implement in the controller (resolve referent, set status.endpoint and status.phase); extend the API only if the new kind cannot use the same probe path or needs different enrichment.
@@ -2,7 +2,7 @@
 
 This document describes how the **GET /v1/models** endpoint discovers and returns the list of available models.
 
-The list is **based on MaaSModelRef** custom resources: the API returns models that are registered as MaaSModelRefs in its configured namespace.
+The list is **based on MaaSModelRef** custom resources: the API considers MaaSModelRef objects cluster-wide (all namespaces), then filters by access.
 
 ## Overview
 
@@ -17,15 +17,17 @@ Each entry includes an `id`, **`url`** (the model’s endpoint), a `ready` flag,
 
 ## MaaSModelRef flow
 
-When the [MaaS controller](https://github.com/opendatahub-io/models-as-a-service/tree/main/maas-controller) is installed and the API is configured with a MaaSModelRef lister and namespace, the flow is:
+When the [MaaS controller](https://github.com/opendatahub-io/models-as-a-service/tree/main/maas-controller) is installed and the API is configured with a MaaSModelRef lister, the flow is:
 
-1. The MaaS API lists all **MaaSModelRef** custom resources in its configured namespace (e.g. `opendatahub`). It reads them from an **in-memory cache** in the maas-api component (maintained by a Kubernetes informer), so it does not call the Kubernetes API on every request.
+1. The MaaS API discovers **MaaSModelRef** custom resources **cluster-wide** (all namespaces) without calling the Kubernetes API on every request.
 
 2. For each MaaSModelRef, it reads **id** (`metadata.name`), **url** (`status.endpoint`), **ready** (`status.phase == "Ready"`), and related metadata. The controller has populated `status.endpoint` and `status.phase` from the underlying LLMInferenceService (for llmisvc) or HTTPRoute/Gateway.
 
 3. **Access validation**: The API probes each model’s `/v1/models` endpoint with the **exact Authorization header** the client sent (passed through as-is). Only models that return **2xx**, **3xx** or **405** are included in the response. This ensures the list only shows models the client is authorized to use.
 
-4. The filtered list is returned to the client.
+4. For each model, the API reads **annotations** from the MaaSModelRef to populate `modelDetails` in the response (display name, description, use case, context window). See [CRD annotations](crd-annotations.md) for the full list.
+
+5. The filtered list is returned to the client.
 
 ```mermaid
 sequenceDiagram
@@ -53,7 +55,7 @@ sequenceDiagram
 
 - **Consistent with gateway**: The same model names and routes are used for inference; the list matches what the gateway will accept for that client.
 
-If the API is not configured with a MaaSModelRef lister and namespace, or if listing fails (e.g. CRD not installed, no RBAC, or server error), the API returns an empty list or an error.
+If the API is not configured with a MaaSModelRef lister, or if listing fails (e.g. CRD not installed, no RBAC, or server error), the API returns an empty list or an error.
 
 ## Subscription Filtering and Aggregation
 
@@ -172,14 +174,18 @@ To have models appear via the **MaaSModelRef** flow:
         kind: MaaSModelRef
         metadata:
           name: my-model-name   # This becomes the model "id" in GET /v1/models
-          namespace: opendatahub
+          namespace: llm          # Same namespace as the LLMInferenceService
+          annotations:
+            openshift.io/display-name: "My Model"                  # optional: human-readable name
+            openshift.io/description: "A general-purpose LLM"      # optional: description
+            opendatahub.io/genai-use-case: "chat"                  # optional: use case
+            opendatahub.io/context-window: "4096"                  # optional: context window
         spec:
           modelRef:
             kind: LLMInferenceService
             name: my-llm-isvc-name
-            namespace: llm
 
-4. The controller reconciles the MaaSModelRef and sets `status.endpoint` and `status.phase`. The MaaS API (in the same namespace) will then include this model in GET /v1/models when it lists MaaSModelRef CRs.
+4. The controller reconciles the MaaSModelRef and sets `status.endpoint` and `status.phase`. The MaaS API will then include this model in GET /v1/models when it lists MaaSModelRef CRs.
 
 You can use the [maas-system samples](https://github.com/opendatahub-io/models-as-a-service/tree/main/docs/samples/maas-system) as a template; the install script deploys LLMInferenceService + MaaSModelRef + MaaSAuthPolicy + MaaSSubscription together so dependencies resolve correctly.
 
 
@@ -122,6 +122,77 @@ spec:
 
 ### Complete example
 
+Add the `alpha.maas.opendatahub.io/tiers` annotation to enable automatic RBAC setup for tier-based access:
+
+```yaml
+apiVersion: serving.kserve.io/v1alpha1
+kind: LLMInferenceService
+metadata:
+  name: my-production-model
+  namespace: llm
+  annotations:
+    alpha.maas.opendatahub.io/tiers: '[]'
+spec:
+  # ... rest of spec ...
+```
+
+**Annotation Values:**
+
+- **Empty list `[]`**: Grants access to **all tiers** (recommended for most models)
+- **List of tier names**: Grants access to specific tiers only
+  - Example: `'["premium","enterprise"]'` - only premium and enterprise tiers can access
+- **Missing annotation**: **No tiers** have access by default (model won't be accessible via MaaS)
+
+**Examples:**
+
+Allow all tiers:
+
+```yaml
+annotations:
+  alpha.maas.opendatahub.io/tiers: '[]'
+```
+
+Allow specific tiers:
+
+```yaml
+annotations:
+  alpha.maas.opendatahub.io/tiers: '["premium","enterprise"]'
+```
+
+### Step 3: Add Display Metadata (Optional)
+
+Add standard annotations to your **MaaSModelRef** to provide human-readable names and descriptions in the `GET /v1/models` API response:
+
+```yaml
+apiVersion: maas.opendatahub.io/v1alpha1
+kind: MaaSModelRef
+metadata:
+  name: my-production-model
+  namespace: llm
+  annotations:
+    openshift.io/display-name: "My Production Model"
+    openshift.io/description: "A fine-tuned model for production workloads"
+    opendatahub.io/genai-use-case: "chat"
+    opendatahub.io/context-window: "8192"
+spec:
+  modelRef:
+    kind: LLMInferenceService
+    name: my-production-model
+```
+
+These annotations are returned in the `modelDetails` field of the API response. All are optional. See [CRD annotations](crd-annotations.md) for the full list of supported annotations across all MaaS CRDs.
+
+### What the Annotation Does
+
+This annotation automatically creates the necessary RBAC resources (Roles and RoleBindings) that allow tier-specific service accounts to POST to your `LLMInferenceService`. The ODH Controller handles this automatically when the annotation is present.
+
+Behind the scenes, it creates:
+
+- **Role**: Grants `POST` permission on `llminferenceservices` resource
+- **RoleBinding**: Binds tier service account groups (e.g., `system:serviceaccounts:maas-default-gateway-tier-premium`) to the role
+
+### Complete Example
+
 Here's a complete example of an LLMInferenceService configured for MaaS:
 
 ```yaml
 
@@ -91,11 +91,11 @@ kubectl get pods -n llm
 **Validate MaaSModelRef status** — The MaaS controller populates `status.endpoint` and `status.phase` on each MaaSModelRef from the LLMInferenceService. The MaaSModelRef `status.endpoint` should match the URL exposed by the LLMInferenceService (via the gateway). Verify:
 
 ```bash
-# Check MaaSModelRef status (use opendatahub for ODH, redhat-ods-applications for RHOAI)
-kubectl get maasmodelref -n opendatahub -o wide
+# Check MaaSModelRef status (same namespace as the LLMInferenceService, e.g. llm)
+kubectl get maasmodelref -n llm -o wide
 
 # Verify status.endpoint is populated and phase is Ready
-kubectl get maasmodelref -n opendatahub -o jsonpath='{range .items[*]}{.metadata.name}: phase={.status.phase} endpoint={.status.endpoint}{"\n"}{end}'
+kubectl get maasmodelref -n llm -o jsonpath='{range .items[*]}{.metadata.name}: phase={.status.phase} endpoint={.status.endpoint}{"\n"}{end}'
 
 # Compare with LLMInferenceService — status.endpoint should match the URL from LLMIS status.addresses or status.url
 kubectl get llminferenceservice -n llm -o yaml | grep "url:"
 
@@ -1,6 +1,6 @@
 # MaaSModelRef
 
-Identifies an AI/ML model on the cluster. The MaaS API lists models from MaaSModelRef resources (using `status.endpoint` and `status.phase`).
+Identifies an AI/ML model on the cluster. Create MaaSModelRef in the **same namespace** as the backend (`LLMInferenceService`, `ExternalModel`, etc.). The MaaS API lists models from MaaSModelRef resources cluster-wide (using `status.endpoint` and `status.phase`).
 
 ## MaaSModelRefSpec
 
 
@@ -25,10 +25,9 @@ kustomize build docs/samples/maas-system/ | kubectl apply -f -
 # Or deploy a specific sample
 kustomize build docs/samples/maas-system/facebook-opt-125m-cpu/ | kubectl apply -f -
 kustomize build docs/samples/maas-system/qwen3/ | kubectl apply -f -
-```
 
 # Verify
-kubectl get maasmodelref -n opendatahub
+kubectl get maasmodelref -n llm
 kubectl get maasauthpolicy,maassubscription -n models-as-a-service
 kubectl get llminferenceservice -n llm
 ```
@@ -44,7 +43,7 @@ kubectl create namespace llm --dry-run=client -o yaml | kubectl apply -f -
 kustomize build docs/samples/maas-system | sed "s/namespace: models-as-a-service/namespace: my-namespace/g" | kubectl apply -f -
 
 # Verify
-kubectl get maasmodelref -n opendatahub
+kubectl get maasmodelref -n llm
 kubectl get maasauthpolicy,maassubscription -n my-namespace
 kubectl get llminferenceservice -n llm
 ```
@@ -3,6 +3,9 @@ kind: MaaSAuthPolicy
 metadata:
   name: simulator-access
   namespace: models-as-a-service
+  annotations:
+    openshift.io/display-name: "Simulator Access (Free)"
+    openshift.io/description: "Grants all authenticated users access to the free-tier simulator model"
 spec:
   modelRefs:
     - name: facebook-opt-125m-simulated
 
@@ -3,6 +3,9 @@ kind: MaaSModelRef
 metadata:
   name: facebook-opt-125m-simulated
   namespace: llm
+  annotations:
+    openshift.io/display-name: "Facebook OPT 125M (Simulated)"
+    openshift.io/description: "A simulated OPT-125M model for free-tier testing"
 spec:
   modelRef:
     kind: LLMInferenceService