Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
65ca551
fix: add explicit command to v0.8.2 simulator models to prevent bash …
jrhyness Apr 17, 2026
c01dc5b
fix: minor updates for external model (#771)
nirrozenbaum Apr 19, 2026
fae753e
chore: add .worktrees/ to .gitignore (#774)
Apr 20, 2026
89fba29
chore: promote main to stable (#770)
github-actions[bot] Apr 20, 2026
dbf6d03
fix: validate token rate limits and skip invalid subs in TRLP aggrega…
liangwen12year Apr 20, 2026
b327b34
feat: add OIDC token support for model discovery via /v1/models (#703)
jrhyness Apr 20, 2026
6bea2fb
chore(deps): update registry.access.redhat.com/ubi9/go-toolset docker…
konflux-internal-p02[bot] Apr 21, 2026
5928f54
chore(deps): update registry.access.redhat.com/ubi9/ubi-minimal docke…
konflux-internal-p02[bot] Apr 21, 2026
147eaa2
fix: per-model(s) top-level values in usage dashboard (#772)
ahadas Apr 21, 2026
b9a8979
chore(deps): update registry.access.redhat.com/ubi9/go-toolset docker…
konflux-internal-p02[bot] Apr 21, 2026
e746008
docs: add/update documentation for Maas Tenant (#773)
ishitasequeira Apr 21, 2026
1b8f212
chore: restrict rbac for db secret (#779)
ishitasequeira Apr 21, 2026
fb2ea25
feat: add tenant CRD to e2e artifact collection and debug report (#787)
chaitanya1731 Apr 22, 2026
fb75981
chore: promote main to stable (#788)
chaitanya1731 Apr 22, 2026
fad486e
chore: promote stable to rhoai (#789)
chaitanya1731 Apr 22, 2026
cbcbd25
Merge remote-tracking branch 'upstream/rhoai'
moulalis Apr 22, 2026
cfc8964
chore(deps): update registry.access.redhat.com/ubi9/ubi-minimal docke…
konflux-internal-p02[bot] Apr 22, 2026
b5ade4b
Merge remote-tracking branch 'downstream/main' into sync-main-to-rhoa…
chaitanya1731 Apr 22, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/hack/install-odh.sh
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,9 @@ EOF
fi
fi

# 7. Apply DataScienceCluster (modelsAsService Unmanaged - MaaS deployed separately)
# 7. Apply DataScienceCluster (KServe + ModelsAsService Managed)
# The manifest filename retains "unmanaged" for backward compat; contents include
# modelsAsService.managementState: Managed so the operator deploys maas-controller.
echo "7. Applying DataScienceCluster..."
if kubectl get datasciencecluster -A --no-headers 2>/dev/null | grep -q .; then
echo " DataScienceCluster already exists, skipping"
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,9 @@ apps/backend/.env
CLAUDE.md
.cursor/

# Git worktrees
.worktrees/

# Docs build and site directories
docs/build/
docs/site/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -82,15 +82,8 @@ spec:
pattern: ^[a-zA-Z0-9]([a-zA-Z0-9\-]*[a-zA-Z0-9])?(\.[a-zA-Z0-9]([a-zA-Z0-9\-]*[a-zA-Z0-9])?)*$
type: string
provider:
description: |-
Provider identifies the API format and auth type for the external model.
The allowed values are: "openai", "anthropic", "azure-openai", "vertex" and "bedrock-openai".
enum:
- openai
- anthropic
- azure-openai
- vertex
- bedrock-openai
description: Provider identifies the API format and auth type for
the external model.
maxLength: 63
minLength: 1
type: string
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,11 @@ spec:
description: TokenRateLimit defines a token rate limit
properties:
limit:
description: Limit is the maximum number of tokens allowed
description: |-
Limit is the maximum number of tokens allowed within the window.
Must be between 1 and 1,000,000,000 (1 billion).
format: int64
maximum: 1000000000
minimum: 1
type: integer
window:
Expand Down
16 changes: 15 additions & 1 deletion deployment/base/maas-controller/rbac/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ rules:
resources:
- endpoints
- pods
- secrets
verbs:
- get
- list
Expand All @@ -35,6 +34,21 @@ rules:
- get
- list
- watch
- apiGroups:
- ""
resources:
- secrets
verbs:
- list
- watch
- apiGroups:
- ""
resourceNames:
- maas-db-config
resources:
- secrets
verbs:
- get
- apiGroups:
- ""
resources:
Expand Down
2 changes: 1 addition & 1 deletion deployment/base/payload-processing/rbac/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,5 @@ rules:
verbs: ["get", "list", "watch"]
# model-provider-resolver plugin: watches ExternalModel CRDs across namespaces
- apiGroups: ["maas.opendatahub.io"]
resources: ["maasmodelrefs", "externalmodels"]
resources: ["externalmodels"]
verbs: ["get", "list", "watch"]
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,29 @@ spec:
datasource:
kind: PrometheusDatasource
name: kuadrant-prometheus-datasource
query: 'count(count by (user) (increase(authorized_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range]) > 0)) or vector(0)'
query: |-
count(
count by (user) (
(
(
sum by (user, subscription, limitador_namespace) (
increase(authorized_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range])
)
+
sum by (user, subscription, limitador_namespace) (
increase(limited_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range])
)
)
or
sum by (user, subscription, limitador_namespace) (
increase(authorized_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range])
)
)
* on(user, subscription, limitador_namespace)
(0 * max by (user, subscription, limitador_namespace) (max_over_time(authorized_hits{model=~"$model"}[$__range])) + 1)
> 0
)
) or vector(0)
seriesNameFormat: Users
successRate:
kind: Panel
Expand All @@ -105,7 +127,41 @@ spec:
datasource:
kind: PrometheusDatasource
name: kuadrant-prometheus-datasource
query: '((sum(increase(authorized_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range]))) / ((sum(increase(authorized_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range])) + (sum(increase(limited_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range])) or vector(0))) > 0)) or vector(1)'
query: |-
(
(
sum(
sum by (user, subscription, limitador_namespace) (
increase(authorized_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range])
)
* on(user, subscription, limitador_namespace)
(0 * max by (user, subscription, limitador_namespace) (max_over_time(authorized_hits{model=~"$model"}[$__range])) + 1)
)
)
/
(
(
sum(
sum by (user, subscription, limitador_namespace) (
increase(authorized_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range])
)
* on(user, subscription, limitador_namespace)
(0 * max by (user, subscription, limitador_namespace) (max_over_time(authorized_hits{model=~"$model"}[$__range])) + 1)
)
+
(
sum(
sum by (user, subscription, limitador_namespace) (
increase(limited_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range])
)
* on(user, subscription, limitador_namespace)
(0 * max by (user, subscription, limitador_namespace) (max_over_time(authorized_hits{model=~"$model"}[$__range])) + 1)
)
or vector(0)
)
) > 0
)
) or vector(1)
seriesNameFormat: Success Rate
tokenConsumptionByUser:
kind: Panel
Expand Down Expand Up @@ -180,7 +236,15 @@ spec:
query: |-
round(
sum by (user, subscription, model) (
sum by (user, subscription, limitador_namespace) (increase(authorized_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range]))
(
(
sum by (user, subscription, limitador_namespace) (increase(authorized_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range]))
+
sum by (user, subscription, limitador_namespace) (increase(limited_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range]))
)
or
sum by (user, subscription, limitador_namespace) (increase(authorized_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range]))
)
* on(user, subscription, limitador_namespace) group_left(model)
(0 * max by (user, subscription, limitador_namespace, model) (max_over_time(authorized_hits{model=~"$model"}[$__range])) + 1)
)
Expand Down Expand Up @@ -228,7 +292,15 @@ spec:
datasource:
kind: PrometheusDatasource
name: kuadrant-prometheus-datasource
query: 'sum(increase(limited_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range])) or vector(0)'
query: |-
sum(
sum by (user, subscription, limitador_namespace) (
increase(limited_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range])
)
* on(user, subscription, limitador_namespace)
(0 * max by (user, subscription, limitador_namespace) (max_over_time(authorized_hits{model=~"$model"}[$__range])) + 1)
)
or vector(0)
seriesNameFormat: Errors
totalRequests:
kind: Panel
Expand All @@ -253,7 +325,28 @@ spec:
datasource:
kind: PrometheusDatasource
name: kuadrant-prometheus-datasource
query: '(sum(increase(authorized_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range])) or vector(0)) + (sum(increase(limited_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range])) or vector(0))'
query: |-
(
sum(
sum by (user, subscription, limitador_namespace) (
increase(authorized_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range])
)
* on(user, subscription, limitador_namespace)
(0 * max by (user, subscription, limitador_namespace) (max_over_time(authorized_hits{model=~"$model"}[$__range])) + 1)
)
or vector(0)
)
+
(
sum(
sum by (user, subscription, limitador_namespace) (
increase(limited_calls{user!="", user=~"$user", subscription=~"$subscription"}[$__range])
)
* on(user, subscription, limitador_namespace)
(0 * max by (user, subscription, limitador_namespace) (max_over_time(authorized_hits{model=~"$model"}[$__range])) + 1)
)
or vector(0)
)
seriesNameFormat: Requests
totalTokens:
kind: Panel
Expand Down
19 changes: 10 additions & 9 deletions docs/content/advanced-administration/observability.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,18 +144,19 @@ The observability stack consists of:

There are two ways to enable deployment-based observability:

1. **Operator-managed** (recommended): Enable via ModelsAsService CR
1. **Operator-managed** (recommended): Enable via Tenant CR
2. **Kustomize-based**: Deploy manifests directly

### Option 1: Operator-Managed Telemetry

When using the ODH/RHOAI operator, telemetry can be enabled via the ModelsAsService CR:
When using the ODH/RHOAI operator, telemetry can be enabled via the Tenant CR (self-bootstrapped by `maas-controller` in the `models-as-a-service` namespace):

```yaml
apiVersion: components.platform.opendatahub.io/v1alpha1
kind: ModelsAsService
apiVersion: maas.opendatahub.io/v1alpha1
kind: Tenant
metadata:
name: default-modelsasservice
name: default-tenant
namespace: models-as-a-service
spec:
telemetry:
enabled: true # Enable TelemetryPolicy and Istio Telemetry
Expand All @@ -169,25 +170,25 @@ spec:
Or patch an existing CR:

```bash
kubectl patch modelsasservice default-modelsasservice --type=merge \
kubectl patch tenant default-tenant -n models-as-a-service --type=merge \
-p '{"spec":{"telemetry":{"enabled":true}}}'
```

**What the operator creates when `telemetry.enabled: true`:**
**What the Tenant reconciler creates when `telemetry.enabled: true`:**

| Resource | Namespace | Purpose |
|----------|-----------|---------|
| TelemetryPolicy (`maas-telemetry`) | Gateway namespace | Adds `user`, `subscription`, `model` labels to Limitador usage metrics |
| Istio Telemetry (`latency-per-subscription`) | Gateway namespace | Adds `subscription` label to gateway latency metrics |

!!! note "Prerequisites for Operator-Managed Telemetry"
The operator-managed telemetry feature requires:
The Tenant reconciler telemetry feature requires:

- **OpenShift Service Mesh (Istio)** 2.4+ — for Istio Telemetry CRD
- **Kuadrant/RHCL** — for TelemetryPolicy CRD and AuthPolicy header injection
- **Gateway deployed** — Telemetry targets the gateway via selector

The operator checks for CRD availability before creating resources. If a CRD is not present, that resource is silently skipped.
The Tenant reconciler checks for CRD availability before creating resources. If a CRD is not present, that resource is silently skipped.

!!! warning "AuthPolicy Header Dependency"
The Istio Telemetry reads the `subscription` value from the `X-MaaS-Subscription` header, which must be injected by AuthPolicy:
Expand Down
2 changes: 1 addition & 1 deletion docs/content/concepts/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ graph TB
6. Only the hash and metadata (username, groups, name, `subscription` — the MaaSSubscription name bound at mint, `expiresAt`) are stored in PostgreSQL.
7. The plaintext key is returned to the user **only in this minting response** (show-once), along with `expiresAt`; it is **not** exposed again on later reads. The diagram below stops at storage and does not show the HTTP response back to the user.

Every key expires. With **operator-managed** MaaS, the cluster operator sets the maximum lifetime on the **`ModelsAsService`** CR: **`spec.apiKeys.maxExpirationDays`** (see [ModelsAsService CR](../install/maas-setup.md#modelsasservice-cr)). **`maas-api`** applies that cap as **`API_KEY_MAX_EXPIRATION_DAYS`** (for example 90 days by default when defaults apply). Omit **`expiresIn`** on create to use that maximum, or set a shorter **`expiresIn`** (e.g., `30d`, `90d`, `1h`) within the configured cap. The response always includes **`expiresAt`** (RFC3339).
Every key expires. With **operator-managed** MaaS, the cluster operator sets the maximum lifetime on the **`Tenant`** CR: **`spec.apiKeys.maxExpirationDays`** (see [Tenant CR](../install/maas-setup.md#tenant-cr)). **`maas-api`** applies that cap as **`API_KEY_MAX_EXPIRATION_DAYS`** (for example 90 days by default when defaults apply). Omit **`expiresIn`** on create to use that maximum, or set a shorter **`expiresIn`** (e.g., `30d`, `90d`, `1h`) within the configured cap. The response always includes **`expiresAt`** (RFC3339).

```mermaid
graph TB
Expand Down
Loading
Loading