Skip to content

Latest commit

 

History

History
272 lines (199 loc) · 13.1 KB

File metadata and controls

272 lines (199 loc) · 13.1 KB

MaaS Controller Overview

This document describes the MaaS Controller: what was built, how it fits into the Models-as-a-Service (MaaS) stack, and how the pieces work together. It is intended for presentations, onboarding, and technical deep-dives.

!!! todo "Documentation cleanup" TODO: Clean up this documentation.


1. What Is the MaaS Controller?

The MaaS Controller is a Kubernetes controller that provides a subscription-style control plane for Models-as-a-Service. It lets platform operators define:

  • Which models are exposed through MaaS (via MaaSModelRef).
  • Who can access those models (via MaaSAuthPolicy).
  • Per-user/per-group token rate limits for those models (via MaaSSubscription).

The controller does not run inference. It reconciles your high-level MaaS CRs into the underlying Gateway API and Kuadrant resources (HTTPRoutes, AuthPolicies, TokenRateLimitPolicies) that enforce routing, authentication, and rate limiting at the gateway.


2. High-Level Architecture

flowchart TB
    subgraph Operator["Platform operator"]
        MaaSModelRef["MaaSModelRef"]
        MaaSAuthPolicy["MaaSAuthPolicy"]
        MaaSSubscription["MaaSSubscription"]
    end

    subgraph Controller["maas-controller"]
        ModelReconciler["MaaSModelRef\nReconciler"]
        AuthReconciler["MaaSAuthPolicy\nReconciler"]
        SubReconciler["MaaSSubscription\nReconciler"]
    end

    subgraph GatewayStack["Gateway API + Kuadrant"]
        HTTPRoute["HTTPRoute"]
        AuthPolicy["AuthPolicy\n(Kuadrant)"]
        TokenRateLimitPolicy["TokenRateLimitPolicy\n(Kuadrant)"]
    end

    subgraph Backend["Backend"]
        LLMIS["LLMInferenceService\n(KServe)"]
    end

    MaaSModelRef --> ModelReconciler
    MaaSAuthPolicy --> AuthReconciler
    MaaSSubscription --> SubReconciler

    ModelReconciler --> HTTPRoute
    AuthReconciler --> AuthPolicy
    SubReconciler --> TokenRateLimitPolicy

    HTTPRoute --> AuthPolicy
    HTTPRoute --> TokenRateLimitPolicy
    HTTPRoute --> LLMIS
Loading

Summary: You declare intent with MaaS CRs; the controller turns that into Gateway/Kuadrant resources that attach to the same HTTPRoute and backend (e.g. KServe LLMInferenceService).

The MaaS API GET /v1/models endpoint uses MaaSModelRef CRs as its primary source: it reads them cluster-wide (all namespaces), then validates access by probing each model’s /v1/models endpoint with the client’s Authorization header (passed through as-is). Only models that return 2xx or 405 are included. So the catalogue returned to the client is the set of MaaSModelRef objects the controller reconciles, filtered to those the client can actually access. No token exchange is performed; the header is forwarded as-is.


3. Request Flow (End-to-End)

sequenceDiagram
    participant User
    participant Gateway
    participant AuthPolicy as Kuadrant AuthPolicy
    participant TRLP as TokenRateLimitPolicy
    participant Backend as LLMInferenceService

    User->>Gateway: POST /llm/<model>/v1/chat/completions (Bearer token)
    Gateway->>AuthPolicy: Validate token (TokenReview)
    AuthPolicy->>AuthPolicy: Check groups/users, build identity
    Note over AuthPolicy: Writes identity (userid, groups_str)
    AuthPolicy-->>Gateway: Identity attached to request
    Gateway->>TRLP: Evaluate rate limit (identity-based)
    TRLP->>TRLP: groups_str.split(',').exists(g, g == "group")
    TRLP-->>Gateway: Allow / deny by quota
    Gateway->>Backend: Forward request
    Backend-->>User: Inference response
Loading
  • AuthPolicy authenticates (e.g. OpenShift token via Kubernetes TokenReview), authorizes (allowed groups/users), and writes identity (e.g. userid, groups, groups_str).
  • TokenRateLimitPolicy uses that identity (in particular the comma-separated groups_str) to decide which subscription and limits apply.

4. The “String Trick” (AuthPolicy → TokenRateLimitPolicy)

Kuadrant’s TokenRateLimitPolicy CEL predicates do not always support array fields the same way as the AuthPolicy response. To pass user groups from AuthPolicy to TokenRateLimitPolicy in a reliable way, the controller uses a comma-separated string:

  1. AuthPolicy (controller-generated)

    • In the success response identity, the controller adds a property groups_str with a CEL expression that takes all user groups (unfiltered) and joins them with a comma:
      auth.identity.user.groups.join(",")
    • So the identity object has both groups (array) and groups_str (string, e.g. "system:authenticated,free-user,premium-user").
    • Groups are passed unfiltered so that TRLP predicates can match against subscription groups, which may differ from auth policy groups.
  2. TokenRateLimitPolicy (controller-generated)

    • For each subscription owner group, the controller generates a CEL predicate that splits groups_str and checks membership, e.g.
      auth.identity.groups_str.split(",").exists(g, g == "free-user").

So: AuthPolicy turns the user-groups array into a comma-separated string; TokenRateLimitPolicy turns that string back into a logical list and uses it for rate-limit matching. That’s the “string trick.”


5. What the Controller Creates (Runtime View)

flowchart LR
    subgraph MaaS["MaaS CRs (your intent)"]
        MM["MaaSModelRef\n(model ref)"]
        MAP["MaaSAuthPolicy\n(modelRefs + subjects)"]
        MS["MaaSSubscription\n(owner + modelRefs + limits)"]
    end

    subgraph Generated["Generated (per model / route)"]
        HR["HTTPRoute"]
        AP["AuthPolicy"]
        TRL["TokenRateLimitPolicy"]
    end

    MM --> HR
    MAP --> AP
    MS --> TRL
    HR --> AP
    HR --> TRL
Loading
Your resource Controller creates / uses
MaaSModelRef HTTPRoute (or validates KServe-created route for LLMInferenceService)
MaaSAuthPolicy One AuthPolicy per referenced model; targets that model’s HTTPRoute
MaaSSubscription One TokenRateLimitPolicy per referenced model; targets that model’s HTTPRoute

All generated resources are labeled app.kubernetes.io/managed-by: maas-controller.


6. Component Diagram (Controller Internals)

flowchart TB
    subgraph Cluster["Kubernetes cluster"]
        subgraph maas_controller["maas-controller (Deployment)"]
            Manager["Controller Manager"]
            ModelReconciler["MaaSModelRef\nReconciler"]
            AuthReconciler["MaaSAuthPolicy\nReconciler"]
            SubReconciler["MaaSSubscription\nReconciler"]
        end

        CRDs["CRDs: MaaSModelRef,\nMaaSAuthPolicy,\nMaaSSubscription"]
        RBAC["RBAC: ClusterRole,\nServiceAccount, etc."]
    end

    Watch["Watch MaaS CRs,\nGateway API, Kuadrant,\nLLMInferenceService"]
    Manager --> ModelReconciler
    Manager --> AuthReconciler
    Manager --> SubReconciler
    ModelReconciler --> Watch
    AuthReconciler --> Watch
    SubReconciler --> Watch
    CRDs --> Watch
    RBAC --> maas_controller
Loading
  • Single binary: manager runs three reconcilers.
  • Registers Kubernetes core, Gateway API, KServe (v1alpha1), and MaaS (v1alpha1) schemes; uses unstructured for Kuadrant resources.
  • Reads/writes MaaS CRs, HTTPRoutes, Gateways, AuthPolicies, TokenRateLimitPolicies, and LLMInferenceServices (read-only for model metadata/routes).

7. Data Model (Simplified)

erDiagram
    MaaSModelRef ||--o{ HTTPRoute : "creates or validates"
    MaaSModelRef }o--|| LLMInferenceService : "references (kind: LLMInferenceService)"
    MaaSAuthPolicy ||--o{ AuthPolicy : "one per model"
    MaaSAuthPolicy }o--o{ MaaSModelRef : "modelRefs"
    MaaSSubscription ||--o{ TokenRateLimitPolicy : "one per model"
    MaaSSubscription }o--o{ MaaSModelRef : "modelRefs"
    AuthPolicy }o--|| HTTPRoute : "targetRef"
    TokenRateLimitPolicy }o--|| HTTPRoute : "targetRef"
    HTTPRoute }o--|| Gateway : "parentRef"
Loading
  • MaaSModelRef: spec.modelRef.kind = LLMInferenceService or ExternalModel; spec.modelRef.name = name of the referenced model resource.
  • MaaSAuthPolicy: spec.modelRefs (list of ModelRef objects with name and namespace), spec.subjects (groups, users).
  • MaaSSubscription: spec.owner (groups, users), spec.modelRefs (list of ModelSubscriptionRef objects with name, namespace, and required tokenRateLimits array to define per-model rate limits).

8. Deployment and Prerequisites

flowchart LR
    subgraph Prereqs["Prerequisites"]
        ODH["Open Data Hub\n(opendatahub ns)"]
        GW["Gateway API"]
        Kuadrant["Kuadrant"]
        KServe["KServe (optional)\nfor LLMInferenceService"]
    end

    subgraph Install["Install steps"]
        Deploy["deploy.sh"]
        Examples["Optional: install-examples.sh"]
    end

    Prereqs --> Deploy
    Deploy --> Examples
Loading
  • Namespaces: MaaS API and controller default to opendatahub (configurable). MaaSAuthPolicy and MaaSSubscription default to models-as-a-service (configurable). MaaSModelRef must live in the same namespace as the model it references (e.g. llm).
  • Install: ./scripts/deploy.sh installs the full stack including the controller. Optionally run ./scripts/install-examples.sh for sample MaaSModelRef, MaaSAuthPolicy, and MaaSSubscription.

9. Authentication (Current Behavior)

For GET /v1/models, the maas-api forwards the client’s Authorization header as-is to each model endpoint (no token exchange). You can use an OpenShift token or an API key (sk-oai-*). With a user token, you may send X-MaaS-Subscription to filter when you have access to multiple subscriptions.

For model inference (requests to …/llm/<model>/v1/chat/completions and similar), use an API key created via POST /v1/api-keys only. Each key is bound to one MaaSSubscription at mint time.

The Kuadrant AuthPolicy validates API keys via the MaaS API and validates user tokens via Kubernetes TokenReview, deriving user/groups for authorization and for TokenRateLimitPolicy (including groups_str).


9.1. Identity Headers and Defense-in-Depth

For model inference routes (HTTPRoutes targeting model workloads):

The controller-generated AuthPolicies do not inject most identity-related HTTP headers (X-MaaS-Username, X-MaaS-Group, X-MaaS-Key-Id) into requests forwarded to upstream model pods. This is a defense-in-depth security measure to prevent accidental disclosure of user identity, group membership, and key identifiers in:

  • Model runtime logs
  • Upstream debug dumps
  • Misconfigured proxies or sidecars

Exception: X-MaaS-Subscription is injected for Istio Telemetry to enable per-subscription latency tracking. Istio runs in the Envoy gateway and cannot access Authorino's auth.identity context—it can only read request headers. The injected subscription value is server-controlled (resolved by Authorino from validated subscriptions), not client-provided.

All identity information remains available to gateway-level features through Authorino's auth.identity and auth.metadata contexts, which are consumed by:

  • TokenRateLimitPolicy (TRLP): Uses selected_subscription_key, userid, groups, and subscription_info from filters.identity (access subscription_info.labels for tier-based rate limiting)
  • Gateway telemetry/metrics: Accesses identity fields with metrics: true enabled on filters.identity
  • Authorization policies: OPA/Rego rules evaluate auth.identity and auth.metadata directly

For maas-api routes:

The static AuthPolicy for maas-api (deployment/base/maas-api/policies/auth-policy.yaml) still injects X-MaaS-Username and X-MaaS-Group headers, as maas-api's ExtractUserInfo middleware requires them. This is separate from model inference routes and follows a different security model (maas-api is a trusted internal service).

Security motivation:

Model workloads (vLLM, Llama.cpp, etc.) do not require strong identity claims in cleartext headers. By keeping identity at the gateway layer, we reduce the attack surface and limit the blast radius of potential log leaks or upstream vulnerabilities.


10. Summary

Topic Summary
What MaaS Controller = control plane that reconciles MaaSModelRef, MaaSAuthPolicy, and MaaSSubscription into Gateway API and Kuadrant resources.
Where Single controller in opendatahub; MaaSAuthPolicy / MaaSSubscription default to models-as-a-service; MaaSModelRef and generated Kuadrant policies target their model’s namespace.
How Three reconcilers watch MaaS CRs (and related resources); each creates/updates HTTPRoutes, AuthPolicies, or TokenRateLimitPolicies.
Identity bridge AuthPolicy exposes all user groups as a comma-separated groups_str; TokenRateLimitPolicy uses groups_str.split(",").exists(...) for subscription matching (the “string trick”).
Deploy Run ./scripts/deploy.sh; optionally install examples.

This overview should be enough to explain what was created and how it works in talks or written docs.