This document describes the MaaS Controller: what was built, how it fits into the Models-as-a-Service (MaaS) stack, and how the pieces work together. It is intended for presentations, onboarding, and technical deep-dives.
!!! todo "Documentation cleanup" TODO: Clean up this documentation.
The MaaS Controller is a Kubernetes controller that provides a subscription-style control plane for Models-as-a-Service. It lets platform operators define:
- Which models are exposed through MaaS (via MaaSModelRef).
- Who can access those models (via MaaSAuthPolicy).
- Per-user/per-group token rate limits for those models (via MaaSSubscription).
The controller does not run inference. It reconciles your high-level MaaS CRs into the underlying Gateway API and Kuadrant resources (HTTPRoutes, AuthPolicies, TokenRateLimitPolicies) that enforce routing, authentication, and rate limiting at the gateway.
flowchart TB
subgraph Operator["Platform operator"]
MaaSModelRef["MaaSModelRef"]
MaaSAuthPolicy["MaaSAuthPolicy"]
MaaSSubscription["MaaSSubscription"]
end
subgraph Controller["maas-controller"]
ModelReconciler["MaaSModelRef\nReconciler"]
AuthReconciler["MaaSAuthPolicy\nReconciler"]
SubReconciler["MaaSSubscription\nReconciler"]
end
subgraph GatewayStack["Gateway API + Kuadrant"]
HTTPRoute["HTTPRoute"]
AuthPolicy["AuthPolicy\n(Kuadrant)"]
TokenRateLimitPolicy["TokenRateLimitPolicy\n(Kuadrant)"]
end
subgraph Backend["Backend"]
LLMIS["LLMInferenceService\n(KServe)"]
end
MaaSModelRef --> ModelReconciler
MaaSAuthPolicy --> AuthReconciler
MaaSSubscription --> SubReconciler
ModelReconciler --> HTTPRoute
AuthReconciler --> AuthPolicy
SubReconciler --> TokenRateLimitPolicy
HTTPRoute --> AuthPolicy
HTTPRoute --> TokenRateLimitPolicy
HTTPRoute --> LLMIS
Summary: You declare intent with MaaS CRs; the controller turns that into Gateway/Kuadrant resources that attach to the same HTTPRoute and backend (e.g. KServe LLMInferenceService).
The MaaS API GET /v1/models endpoint uses MaaSModelRef CRs as its primary source: it reads them cluster-wide (all namespaces), then validates access by probing each model’s /v1/models endpoint with the client’s Authorization header (passed through as-is). Only models that return 2xx or 405 are included. So the catalogue returned to the client is the set of MaaSModelRef objects the controller reconciles, filtered to those the client can actually access. No token exchange is performed; the header is forwarded as-is.
sequenceDiagram
participant User
participant Gateway
participant AuthPolicy as Kuadrant AuthPolicy
participant TRLP as TokenRateLimitPolicy
participant Backend as LLMInferenceService
User->>Gateway: POST /llm/<model>/v1/chat/completions (Bearer token)
Gateway->>AuthPolicy: Validate token (TokenReview)
AuthPolicy->>AuthPolicy: Check groups/users, build identity
Note over AuthPolicy: Writes identity (userid, groups_str)
AuthPolicy-->>Gateway: Identity attached to request
Gateway->>TRLP: Evaluate rate limit (identity-based)
TRLP->>TRLP: groups_str.split(',').exists(g, g == "group")
TRLP-->>Gateway: Allow / deny by quota
Gateway->>Backend: Forward request
Backend-->>User: Inference response
- AuthPolicy authenticates (e.g. OpenShift token via Kubernetes TokenReview), authorizes (allowed groups/users), and writes identity (e.g.
userid,groups,groups_str). - TokenRateLimitPolicy uses that identity (in particular the comma-separated
groups_str) to decide which subscription and limits apply.
Kuadrant’s TokenRateLimitPolicy CEL predicates do not always support array fields the same way as the AuthPolicy response. To pass user groups from AuthPolicy to TokenRateLimitPolicy in a reliable way, the controller uses a comma-separated string:
-
AuthPolicy (controller-generated)
- In the success response identity, the controller adds a property
groups_strwith a CEL expression that takes all user groups (unfiltered) and joins them with a comma:
auth.identity.user.groups.join(",") - So the identity object has both
groups(array) andgroups_str(string, e.g."system:authenticated,free-user,premium-user"). - Groups are passed unfiltered so that TRLP predicates can match against subscription groups, which may differ from auth policy groups.
- In the success response identity, the controller adds a property
-
TokenRateLimitPolicy (controller-generated)
- For each subscription owner group, the controller generates a CEL predicate that splits
groups_strand checks membership, e.g.
auth.identity.groups_str.split(",").exists(g, g == "free-user").
- For each subscription owner group, the controller generates a CEL predicate that splits
So: AuthPolicy turns the user-groups array into a comma-separated string; TokenRateLimitPolicy turns that string back into a logical list and uses it for rate-limit matching. That’s the “string trick.”
flowchart LR
subgraph MaaS["MaaS CRs (your intent)"]
MM["MaaSModelRef\n(model ref)"]
MAP["MaaSAuthPolicy\n(modelRefs + subjects)"]
MS["MaaSSubscription\n(owner + modelRefs + limits)"]
end
subgraph Generated["Generated (per model / route)"]
HR["HTTPRoute"]
AP["AuthPolicy"]
TRL["TokenRateLimitPolicy"]
end
MM --> HR
MAP --> AP
MS --> TRL
HR --> AP
HR --> TRL
| Your resource | Controller creates / uses |
|---|---|
| MaaSModelRef | HTTPRoute (or validates KServe-created route for LLMInferenceService) |
| MaaSAuthPolicy | One AuthPolicy per referenced model; targets that model’s HTTPRoute |
| MaaSSubscription | One TokenRateLimitPolicy per referenced model; targets that model’s HTTPRoute |
All generated resources are labeled app.kubernetes.io/managed-by: maas-controller.
flowchart TB
subgraph Cluster["Kubernetes cluster"]
subgraph maas_controller["maas-controller (Deployment)"]
Manager["Controller Manager"]
ModelReconciler["MaaSModelRef\nReconciler"]
AuthReconciler["MaaSAuthPolicy\nReconciler"]
SubReconciler["MaaSSubscription\nReconciler"]
end
CRDs["CRDs: MaaSModelRef,\nMaaSAuthPolicy,\nMaaSSubscription"]
RBAC["RBAC: ClusterRole,\nServiceAccount, etc."]
end
Watch["Watch MaaS CRs,\nGateway API, Kuadrant,\nLLMInferenceService"]
Manager --> ModelReconciler
Manager --> AuthReconciler
Manager --> SubReconciler
ModelReconciler --> Watch
AuthReconciler --> Watch
SubReconciler --> Watch
CRDs --> Watch
RBAC --> maas_controller
- Single binary: manager runs three reconcilers.
- Registers Kubernetes core, Gateway API, KServe (v1alpha1), and MaaS (v1alpha1) schemes; uses unstructured for Kuadrant resources.
- Reads/writes MaaS CRs, HTTPRoutes, Gateways, AuthPolicies, TokenRateLimitPolicies, and LLMInferenceServices (read-only for model metadata/routes).
erDiagram
MaaSModelRef ||--o{ HTTPRoute : "creates or validates"
MaaSModelRef }o--|| LLMInferenceService : "references (kind: LLMInferenceService)"
MaaSAuthPolicy ||--o{ AuthPolicy : "one per model"
MaaSAuthPolicy }o--o{ MaaSModelRef : "modelRefs"
MaaSSubscription ||--o{ TokenRateLimitPolicy : "one per model"
MaaSSubscription }o--o{ MaaSModelRef : "modelRefs"
AuthPolicy }o--|| HTTPRoute : "targetRef"
TokenRateLimitPolicy }o--|| HTTPRoute : "targetRef"
HTTPRoute }o--|| Gateway : "parentRef"
- MaaSModelRef:
spec.modelRef.kind= LLMInferenceService or ExternalModel;spec.modelRef.name= name of the referenced model resource. - MaaSAuthPolicy:
spec.modelRefs(list of ModelRef objects with name and namespace),spec.subjects(groups, users). - MaaSSubscription:
spec.owner(groups, users),spec.modelRefs(list of ModelSubscriptionRef objects with name, namespace, and requiredtokenRateLimitsarray to define per-model rate limits).
flowchart LR
subgraph Prereqs["Prerequisites"]
ODH["Open Data Hub\n(opendatahub ns)"]
GW["Gateway API"]
Kuadrant["Kuadrant"]
KServe["KServe (optional)\nfor LLMInferenceService"]
end
subgraph Install["Install steps"]
Deploy["deploy.sh"]
Examples["Optional: install-examples.sh"]
end
Prereqs --> Deploy
Deploy --> Examples
- Namespaces: MaaS API and controller default to opendatahub (configurable). MaaSAuthPolicy and MaaSSubscription default to models-as-a-service (configurable). MaaSModelRef must live in the same namespace as the model it references (e.g. llm).
- Install:
./scripts/deploy.shinstalls the full stack including the controller. Optionally run./scripts/install-examples.shfor sample MaaSModelRef, MaaSAuthPolicy, and MaaSSubscription.
For GET /v1/models, the maas-api forwards the client’s Authorization header as-is to each model endpoint (no token exchange). You can use an OpenShift token or an API key (sk-oai-*). With a user token, you may send X-MaaS-Subscription to filter when you have access to multiple subscriptions.
For model inference (requests to …/llm/<model>/v1/chat/completions and similar), use an API key created via POST /v1/api-keys only. Each key is bound to one MaaSSubscription at mint time.
The Kuadrant AuthPolicy validates API keys via the MaaS API and validates user tokens via Kubernetes TokenReview, deriving user/groups for authorization and for TokenRateLimitPolicy (including groups_str).
For model inference routes (HTTPRoutes targeting model workloads):
The controller-generated AuthPolicies do not inject most identity-related HTTP headers (X-MaaS-Username, X-MaaS-Group, X-MaaS-Key-Id) into requests forwarded to upstream model pods. This is a defense-in-depth security measure to prevent accidental disclosure of user identity, group membership, and key identifiers in:
- Model runtime logs
- Upstream debug dumps
- Misconfigured proxies or sidecars
Exception: X-MaaS-Subscription is injected for Istio Telemetry to enable per-subscription latency tracking. Istio runs in the Envoy gateway and cannot access Authorino's auth.identity context—it can only read request headers. The injected subscription value is server-controlled (resolved by Authorino from validated subscriptions), not client-provided.
All identity information remains available to gateway-level features through Authorino's auth.identity and auth.metadata contexts, which are consumed by:
- TokenRateLimitPolicy (TRLP): Uses
selected_subscription_key,userid,groups, andsubscription_infofromfilters.identity(accesssubscription_info.labelsfor tier-based rate limiting) - Gateway telemetry/metrics: Accesses identity fields with
metrics: trueenabled onfilters.identity - Authorization policies: OPA/Rego rules evaluate
auth.identityandauth.metadatadirectly
For maas-api routes:
The static AuthPolicy for maas-api (deployment/base/maas-api/policies/auth-policy.yaml) still injects X-MaaS-Username and X-MaaS-Group headers, as maas-api's ExtractUserInfo middleware requires them. This is separate from model inference routes and follows a different security model (maas-api is a trusted internal service).
Security motivation:
Model workloads (vLLM, Llama.cpp, etc.) do not require strong identity claims in cleartext headers. By keeping identity at the gateway layer, we reduce the attack surface and limit the blast radius of potential log leaks or upstream vulnerabilities.
| Topic | Summary |
|---|---|
| What | MaaS Controller = control plane that reconciles MaaSModelRef, MaaSAuthPolicy, and MaaSSubscription into Gateway API and Kuadrant resources. |
| Where | Single controller in opendatahub; MaaSAuthPolicy / MaaSSubscription default to models-as-a-service; MaaSModelRef and generated Kuadrant policies target their model’s namespace. |
| How | Three reconcilers watch MaaS CRs (and related resources); each creates/updates HTTPRoutes, AuthPolicies, or TokenRateLimitPolicies. |
| Identity bridge | AuthPolicy exposes all user groups as a comma-separated groups_str; TokenRateLimitPolicy uses groups_str.split(",").exists(...) for subscription matching (the “string trick”). |
| Deploy | Run ./scripts/deploy.sh; optionally install examples. |
This overview should be enough to explain what was created and how it works in talks or written docs.