The MaaS Platform is a Kubernetes-native layer for AI model serving built on Gateway API and policy controllers (Kuadrant, Authorino, Limitador). It provides policy-based authentication and authorization, plus subscription-based rate limiting.
Our future plans include improved request routing and discovery—and we're already sketching what comes after that.
The MaaS Platform is a layer for authorization and rate limiting built on Kuadrant. It sits in front of models you deploy on the cluster; the same pattern is expected to extend to models hosted outside the cluster over time.
Our main components include:
- Gateway (
maas-default-gateway) — Entry point for traffic using Gateway API; HTTPRoutes attach here. - Kuadrant — Policy engine: connects routes and AuthPolicy resources to the Gateway and orchestrates enforcement on the hot path.
- Authorino — Authentication and authorization at the edge.
- Limitador — Rate limiting and tracking usage against subscription limits.
- maas-api — Custom service for API key minting and validation (including the internal endpoint the gateway calls for
sk-oai-*keys).
Main Flows:
- Key minting (blue) — Obtain
sk-oai-*API keys for programmatic access to models (after authenticating with your cluster identity or configured OIDC). Each mint binds a subscription to the key; that association is stored with the key and used on inference. - Inference (green) — Call deployed models to generate completions using an API key (and subscription) on the inference route.
graph TB
subgraph UserLayer["User Layer"]
User[User]
end
subgraph GatewayPolicyLayer["Gateway & Policy Layer"]
Gateway[Gateway]
AuthPolicy[AuthPolicy]
MaaSAuthPolicy[MaaSAuthPolicy]
MaaSSubscription[MaaSSubscription]
end
subgraph TokenManagementLayer["Token Management Layer"]
MaaSAPI[MaaS API]
end
subgraph ModelServingLayer["Model Serving Layer"]
MaaSModelRef["MaaSModelRef"]
InferenceService[Inference Service]
LLM[LLM]
ExternalModel["ExternalModel /<br/>external API"]
end
User -->|"Request Key"| Gateway
Gateway --> AuthPolicy
AuthPolicy --> MaaSAPI
MaaSAPI -->|"Return API Key"| User
User -->|"Inference"| Gateway
Gateway --> MaaSAuthPolicy
MaaSAuthPolicy -.->|"Validate API Key"| MaaSAPI
MaaSAuthPolicy -->|"Rate Limit"| MaaSSubscription
MaaSSubscription --> MaaSModelRef
MaaSModelRef -->|"On-cluster"| InferenceService
MaaSModelRef -.->|"Tech Preview"| ExternalModel
InferenceService --> LLM
LLM -->|"Return Response"| User
ExternalModel -.->|"Return Response"| User
linkStyle 0,1,2,3 stroke:#1976d2,stroke-width:2px
linkStyle 4,5,6,7,8,9,11,12 stroke:#388e3c,stroke-width:2px
linkStyle 10,13 stroke:#388e3c,stroke-width:2px,stroke-dasharray: 6 4
style MaaSAPI fill:#1976d2,stroke:#333,stroke-width:2px,color:#fff
style Gateway fill:#7b1fa2,stroke:#333,stroke-width:2px,color:#fff
style AuthPolicy fill:#e65100,stroke:#333,stroke-width:2px,color:#fff
style MaaSAuthPolicy fill:#e65100,stroke:#333,stroke-width:2px,color:#fff
style MaaSSubscription fill:#e65100,stroke:#333,stroke-width:2px,color:#fff
style MaaSModelRef fill:#e65100,stroke:#333,stroke-width:2px,color:#fff
style InferenceService fill:#388e3c,stroke:#333,stroke-width:2px,color:#fff
style LLM fill:#388e3c,stroke:#333,stroke-width:2px,color:#fff
style ExternalModel fill:#00695c,stroke:#333,stroke-width:2px,color:#fff
Flow summary:
- User sends
POST /maas-api/v1/api-keyswithAuthorization: Bearer {identity-token}.- The body sets which MaaSSubscription to bind (
subscription), or omits it so the platform picks an accessible one (for example by priority). - That subscription is stored on the key at mint; inference later reads it from the key record, not from per-request headers.
- The body sets which MaaSSubscription to bind (
- Validate identity — Authorino (AuthPolicy) checks the token using the configured method:
kubernetesTokenReview— OpenShift cluster tokens- OIDC JWT validation — external IdP (for example Keycloak) — Tech Preview
- After authentication, the request is forwarded to MaaS API (key minting) on the gateway upstream path, with identity context available for minting—Authorino validates the request; it does not proxy or forward the HTTP call to MaaS API itself.
- MaaS API handles key minting using that authenticated identity and the requested subscription binding.
- The service generates a random
sk-oai-*key and hashes it with SHA-256. - Only the hash and metadata (username, groups, name,
subscription— the MaaSSubscription name bound at mint,expiresAt) are stored in PostgreSQL. - The plaintext key is returned to the user only in this minting response (show-once), along with
expiresAt; it is not exposed again on later reads. The diagram below stops at storage and does not show the HTTP response back to the user.
Every key expires. With operator-managed MaaS, the cluster operator sets the maximum lifetime on the Tenant CR: spec.apiKeys.maxExpirationDays (see Tenant CR). maas-api applies that cap as API_KEY_MAX_EXPIRATION_DAYS (for example 90 days by default when defaults apply). Omit expiresIn on create to use that maximum, or set a shorter expiresIn (e.g., 30d, 90d, 1h) within the configured cap. The response always includes expiresAt (RFC3339).
graph TB
subgraph UserLayer["User"]
U[User]
end
subgraph GatewayLayer["Gateway & Policy"]
G[Gateway]
AP["AuthPolicy<br/><sub>Authorino</sub>"]
end
subgraph KeyMinting["MaaS API"]
API[MaaS API]
Gen[Generate sk-oai-* key]
Hash[Hash with SHA-256]
end
subgraph Storage["Storage"]
DB[(PostgreSQL<br/>hashes + subscription + metadata + TTL)]
end
U -->|"POST /maas-api/v1/api-keys"| G
G -->|"Validate identity"| AP
AP -->|"Request continues upstream"| API
API --> Gen
Gen --> Hash
Hash -->|"Store hash + metadata"| DB
style API fill:#1976d2,stroke:#333,stroke-width:2px,color:#fff
style Gen fill:#1976d2,stroke:#333,stroke-width:2px,color:#fff
style Hash fill:#1976d2,stroke:#333,stroke-width:2px,color:#fff
style G fill:#7b1fa2,stroke:#333,stroke-width:2px,color:#fff
style AP fill:#e65100,stroke:#333,stroke-width:2px,color:#fff
style DB fill:#336791,stroke:#333,stroke-width:2px,color:#fff
!!! Tip "OIDC Support"
Tech Preview: OIDC JWT validation on the maas-api route is optional alongside OpenShift kubernetesTokenReview. Model routes still rely on API-key auth; the typical flow is authenticate at maas-api, mint an sk-oai-* key, then use that key for discovery and inference.
!!! note "PostgreSQL" A PostgreSQL database is required and is not included with the MaaS deployment. The deploy script provides a basic PostgreSQL deployment for development and testing—this is not intended for production use. For production, provision and configure your own PostgreSQL instance.
Flow summary:
- User sends inference request with an API key.
- Validate identity — request reaches MaaSAuthPolicy (Authorino) via the Gateway.
- MaaSAuthPolicy validates the key via MaaS API; on failure returns 401/403.
- Check limits — MaaSSubscription (Limitador) enforces token rate limits; on exceed returns 429.
- Request reaches Inference Service when within limits.
- Inference Service forwards to the LLM.
- Completion Response is returned to the user.
graph TB
subgraph UserLayer["User"]
U[User]
end
subgraph GatewayLayer["Gateway & Policy"]
G[Gateway]
MAP[MaaSAuthPolicy<br/>Authorino]
MS[MaaSSubscription<br/>Limitador]
end
subgraph MaaSLayer["Token Management"]
API[MaaS API]
end
subgraph ModelLayer["Model Serving"]
INV[Inference Service]
LLM[LLM]
end
U -->|"1. Inference + API key"| G
G -->|"2. Validate identity"| MAP
MAP -.->|"3. Validate key"| API
MAP -->|"4. Check limits"| MS
MS -->|"5. Within limits"| INV
INV -->|"6. Forward"| LLM
LLM -->|"7. Completion"| U
MAP -.->|"401/403"| U
MS -.->|"429"| U
linkStyle 7 stroke:#c62828,stroke-width:2px,stroke-dasharray:5,5
linkStyle 8 stroke:#c62828,stroke-width:2px,stroke-dasharray:5,5
style API fill:#1976d2,stroke:#333,stroke-width:2px,color:#fff
style G fill:#7b1fa2,stroke:#333,stroke-width:2px,color:#fff
style MAP fill:#e65100,stroke:#333,stroke-width:2px,color:#fff
style MS fill:#e65100,stroke:#333,stroke-width:2px,color:#fff
style INV fill:#388e3c,stroke:#333,stroke-width:2px,color:#fff
style LLM fill:#388e3c,stroke:#333,stroke-width:2px,color:#fff
For inference with an sk-oai-* API key, the policy layer performs two MaaS API steps in order. First the key is validated against PostgreSQL. Subscription is not read from request headers for API keys—it is stored on the key record when the key was minted and is returned as part of validation. Second, that subscription name, together with the username and groups from the key record, is used to confirm the caller may use that subscription for the target model (for example, that the subscription exists, the user still has access, and the model is part of that subscription).
Flow summary:
- The policy layer sends the API key to the MaaS API validate-key path.
- Validate key — MaaS API parses the key, looks up the salted hash in PostgreSQL, and rejects unknown, revoked, expired, or malformed keys (and keys with no subscription bound). On success it returns identity (username, groups, key ID) and the subscription name from the key record (mint-time binding).
- Subscription from the key — The next step uses that subscription name as the requested subscription—not a client-supplied
X-MaaS-Subscriptionvalue. For API keys the subscription in the request body to subscription selection is exactly the subscription returned from validation. - Confirm subscription access — MaaS API subscription selection checks that the user and groups can use that subscription and that the requested model is allowed; failures surface as access denied (for example 403) to the policy layer.
- On success, identity and subscription context are available for rate limiting and metrics. That context is not forwarded as HTTP headers to upstream model workloads (defense in depth). Results may be cached briefly by the policy layer to avoid repeating work on every request.
graph TB
subgraph PolicyLayer["Policy layer"]
P[Policy]
end
subgraph MaaSLayer["MaaS API"]
V[Validate API key]
S[Confirm subscription access]
end
subgraph Storage["Storage"]
DB[(PostgreSQL)]
end
P -->|"1. API key"| V
V -->|"2. Lookup key record"| DB
DB -->|"3. Subscription stored on key"| V
V -.->|"Invalid key"| P
P -->|"4. Groups, username, subscription from key"| S
S -.->|"Access denied"| P
S -->|"5. Authorized"| P
linkStyle 3 stroke:#c62828,stroke-width:2px,stroke-dasharray:5,5
linkStyle 5 stroke:#c62828,stroke-width:2px,stroke-dasharray:5,5
style P fill:#e65100,stroke:#333,stroke-width:2px,color:#fff
style V fill:#1976d2,stroke:#333,stroke-width:2px,color:#fff
style S fill:#1976d2,stroke:#333,stroke-width:2px,color:#fff
style DB fill:#336791,stroke:#333,stroke-width:2px,color:#fff
Token usage and rate-limit data flow from Limitador into Prometheus and onward to dashboards.
Flow summary:
- Limitador stores token usage counters (e.g.,
authorized_hits,authorized_calls,limited_calls) with labels (user,model). - A ServiceMonitor (or Kuadrant PodMonitor) configures Prometheus to scrape Limitador's
/metricsendpoint. - Prometheus stores the metrics in its time-series database.
- Grafana (or other visualization tools) queries Prometheus to build dashboards for usage, billing, and operational health.
graph LR
subgraph RateLimiting["Rate Limiting"]
Limitador[Limitador<br/>Token usage counters<br/>authorized_hits, authorized_calls, limited_calls]
end
subgraph Scraping["Metric Scraping"]
SM[ServiceMonitor<br/>or PodMonitor]
end
subgraph Storage["Metrics Storage"]
Prometheus[(Prometheus)]
end
subgraph Visualization["Visualization"]
Grafana[Grafana<br/>Dashboards]
end
Limitador -->|"/metrics"| SM
SM -->|"Scrape"| Prometheus
Prometheus -->|"Query"| Grafana
style Limitador fill:#e65100,stroke:#333,stroke-width:2px,color:#fff
style Prometheus fill:#1976d2,stroke:#333,stroke-width:2px,color:#fff
style Grafana fill:#388e3c,stroke:#333,stroke-width:2px,color:#fff
Users create API keys by authenticating with an accepted identity token (OpenShift today, or OIDC when configured on the maas-api route). The MaaS API generates a key, stores only the hash in PostgreSQL, and returns the plaintext once:
sequenceDiagram
participant User
participant Gateway as Gateway API
participant Authorino
participant MaaS as MaaS API
participant DB as PostgreSQL
User->>Gateway: POST /maas-api/v1/api-keys<br/>Authorization: Bearer {identity-token}
Gateway->>Authorino: Enforce MaaS API AuthPolicy
Authorino->>Authorino: Validate token (TokenReview or OIDC JWT)
Authorino->>Gateway: Authenticated
Gateway->>MaaS: Forward request with user context
Note over MaaS,DB: Create API Key
MaaS->>MaaS: Generate sk-oai-* key, hash with SHA-256
MaaS->>MaaS: Resolve subscription (explicit or highest priority)
MaaS->>DB: Store hash + metadata (user, groups, subscription, name)
DB-->>MaaS: Stored
MaaS-->>User: { "key": "sk-oai-...", "id": "...", ... }<br/>Plaintext shown ONLY ONCE
Inference requests use the API key. Authorino validates it via HTTP callback (with caching); Limitador enforces subscription-based token limits:
sequenceDiagram
participant Client
participant GatewayAPI
participant Authorino
participant MaaS as MaaS API
participant Limitador
participant LLMInferenceService
Client->>GatewayAPI: Inference + API Key
GatewayAPI->>Authorino: Validate credentials
alt API key (sk-oai-*)
Authorino->>MaaS: POST /internal/v1/api-keys/validate
MaaS->>MaaS: Lookup hash in PostgreSQL
MaaS-->>Authorino: { valid, userId, groups, subscription }
end
Authorino->>MaaS: POST /internal/v1/subscriptions/select (subscription check)
MaaS-->>Authorino: Selected subscription
Authorino->>GatewayAPI: Auth success (cached)
GatewayAPI->>Limitador: Check TokenRateLimitPolicy
Limitador-->>GatewayAPI: Within limits
GatewayAPI->>LLMInferenceService: Forward request
LLMInferenceService-->>Client: Response