Skip to content

Latest commit

 

History

History
263 lines (194 loc) · 9.44 KB

File metadata and controls

263 lines (194 loc) · 9.44 KB

Quota and Access Configuration

This guide provides step-by-step instructions for configuring MaaSModelRef, MaaSAuthPolicy, and MaaSSubscription. For conceptual overview, see Access and Quota Overview and MaaS Models.

Prerequisites

  • MaaS platform installed — See Install MaaS Components
  • LLMInferenceService for your model (or external model endpoints)
  • Cluster admin or equivalent permissions to create CRs in the models-as-a-service and model namespaces

!!! note "Deploy a sample model" Command to deploy simulator model as free-model:

```bash
kustomize build 'https://github.com/opendatahub-io/models-as-a-service//docs/samples/maas-system/free/llm?ref=main' | \
  sed 's/facebook-opt-125m-simulated/free-model/g' | kubectl apply -f -
```

Overview: What You'll Accomplish

Before running the configuration steps, here is the flow you will set up:

  1. Register models — Create a MaaSModelRef for each model so MaaS knows about it and can expose it through the API. The controller reconciles each ref and populates the endpoint.

  2. Grant access — Create MaaSAuthPolicy resources that define which groups or users can use which models. A user must match a policy to see and call a model.

  3. Define subscriptions — Create MaaSSubscription resources that define quota (token rate limits) for groups or users. A user must have both access (policy) and quota (subscription) to use a model.

  4. Validate — Confirm the CRs are reconciled, policies are enforced, and you can list models and run inference.

Configuration Steps

Set the namespace and name of your LLMInferenceService (used in the commands below):

MODEL_NS=llm
MODEL_NAME=free-model  # From the Prerequisites deploy; or: kubectl get llminferenceservice -n $MODEL_NS -o jsonpath='{.items[0].metadata.name}'

1. Register Models (MaaSModelRef)

Create a MaaSModelRef for each model you want to expose through MaaS. The MaaSModelRef must be in the same namespace as the LLMInferenceService. The spec.modelRef.name must match the LLMInferenceService name.

kubectl apply -f - <<EOF
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSModelRef
metadata:
  name: ${MODEL_NAME}-ref
  namespace: ${MODEL_NS}
spec:
  modelRef:
    kind: LLMInferenceService
    name: ${MODEL_NAME}
EOF

Verify the controller has reconciled and set the endpoint:

kubectl get maasmodelref -n ${MODEL_NS}
# Check status.endpoint and status.phase

2. Grant Access (MaaSAuthPolicy)

Create an MaaSAuthPolicy to define which groups/users can access which models:

kubectl apply -f - <<EOF
# MaaSAuthPolicy must be deployed to the models-as-a-service namespace
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSAuthPolicy
metadata:
  name: free-access
  namespace: models-as-a-service
spec:
  modelRefs:
    - name: ${MODEL_NAME}-ref
      namespace: ${MODEL_NS}
  subjects:
    groups:
      - name: free-users
    users: []
EOF

Verify the MaaSAuthPolicy is active (AuthPolicy created and enforced):

# List generated AuthPolicy (may take a moment for controller to reconcile)
kubectl get authpolicy -n ${MODEL_NS} -l maas.opendatahub.io/model=${MODEL_NAME}-ref

# Wait for AuthPolicy to be enforced (re-run if the controller has not reconciled yet)
AUTH_POLICY=$(kubectl get authpolicy -n ${MODEL_NS} -l maas.opendatahub.io/model=${MODEL_NAME}-ref -o jsonpath='{.items[0].metadata.name}')
[[ -n "$AUTH_POLICY" ]] && kubectl wait --for=condition=Enforced=true authpolicy/${AUTH_POLICY} -n ${MODEL_NS} --timeout=120s

Multiple policies per model: You can create multiple MaaSAuthPolicies that reference the same model. The controller aggregates them — a user matching any policy gets access.

3. Define Subscriptions (MaaSSubscription)

Create a MaaSSubscription to define per-model token rate limits for owner groups:

kubectl apply -f - <<EOF
# MaaSSubscription must be deployed to the models-as-a-service namespace
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSSubscription
metadata:
  name: free-subscription
  namespace: models-as-a-service
spec:
  owner:
    groups:
      - name: free-users
    users: []
  modelRefs:
    - name: ${MODEL_NAME}-ref
      namespace: ${MODEL_NS}
      tokenRateLimits:
        - limit: 100
          window: 1m
  priority: 10
EOF

Verify the MaaSSubscription is active (TokenRateLimitPolicy created and enforced):

# List generated TokenRateLimitPolicy (may take a moment for controller to reconcile)
kubectl get tokenratelimitpolicy -n ${MODEL_NS} -l maas.opendatahub.io/model=${MODEL_NAME}-ref

# Wait for TokenRateLimitPolicy to be enforced (re-run if the controller has not reconciled yet)
TRLP=$(kubectl get tokenratelimitpolicy -n ${MODEL_NS} -l maas.opendatahub.io/model=${MODEL_NAME}-ref -o jsonpath='{.items[0].metadata.name}')
[[ -n "$TRLP" ]] && kubectl wait --for=condition=Enforced=true tokenratelimitpolicy/${TRLP} -n ${MODEL_NS} --timeout=120s

!!! note "Namespace requirements" Both MaaSAuthPolicy and MaaSSubscription must be installed in the models-as-a-service namespace. Each modelRefs entry must specify the namespace where the MaaSModelRef lives (e.g. llm).

!!! warning "Using the users field" The subjects.users (MaaSAuthPolicy) and owner.users (MaaSSubscription) fields should be used only for Service Accounts and similar programmatic identities, not for many individual human users. Having too many distinct users can cause cardinality issues in rate limiting and policy enforcement. Prefer groups for human users.

Premium example with higher limits:

kubectl apply -f - <<EOF
# MaaSSubscription must be deployed to the models-as-a-service namespace
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSSubscription
metadata:
  name: premium-subscription
  namespace: models-as-a-service
spec:
  owner:
    groups:
      - name: premium-users
    users: []
  modelRefs:
    - name: ${MODEL_NAME}-ref
      namespace: ${MODEL_NS}
      tokenRateLimits:
        - limit: 100000
          window: 24h
  priority: 20
  tokenMetadata:
    organizationId: "premium-org"
    costCenter: "ai-team"
EOF

4. Validate the Configuration

Check CRs and generated policies:

kubectl get maasmodelref -n ${MODEL_NS}
kubectl get maasauthpolicy,maassubscription -n models-as-a-service
kubectl get authpolicy,tokenratelimitpolicy -n ${MODEL_NS}

Create groups and add users (OpenShift example):

# Create groups if they do not exist
oc adm groups new free-users 2>/dev/null || true
oc adm groups new premium-users 2>/dev/null || true

# Add users to each group
oc adm groups add-users free-users alice@example.com
oc adm groups add-users premium-users bob@example.com

Create API keys and verify access:

Create an API key as a user in free-users and another as a user in premium-users. Follow the Validation guide to:

  1. Get the gateway endpoint and create an API key for each user
  2. List models and run inference with each API key
  3. Test with both groups to confirm access and different rate limits — free-users (100 tokens/min) vs premium-users (100,000 tokens/24h)

Adding Groups

To grant a user access to a subscription, add them to the appropriate Kubernetes group:

# Create groups if they do not exist
oc adm groups new free-users 2>/dev/null || true
oc adm groups new premium-users 2>/dev/null || true

# Add users (method depends on your IdP; OpenShift example)
oc adm groups add-users free-users alice@example.com
oc adm groups add-users premium-users bob@example.com

Users will get subscription access on their next request (after group membership propagates).

Multiple Subscriptions per User

When a user belongs to multiple groups that each have a subscription, the access depends on the API key used. A subscription is bound to each API key at minting (explicit or highest priority). See Understanding Token Management.

Troubleshooting

403 Forbidden: "no access to subscription"

Cause: User requested a subscription they do not belong to (group membership).

Fix: Ensure the user is in a group listed in the subscription's spec.owner.groups.

429 Too Many Requests

Cause: User exceeded token rate limit for the model.

Fix: Wait for the rate limit window to reset, or upgrade to a subscription with higher limits.

Model not appearing in GET /v1/models

Cause: MaaSModelRef missing, not reconciled, or access probe failed.

Fix:

  • Verify MaaSModelRef exists in the model namespace (e.g. llm) and has status.phase: Ready
  • Check MaaSAuthPolicy in models-as-a-service includes the user's groups and references the MaaSModelRef with correct name (e.g. ${MODEL_NAME}-ref) and namespace
  • Ensure MaaSSubscription in models-as-a-service exists for the model and user's groups

Policies not enforced

Cause: Kuadrant controller may need to re-sync.

Fix:

kubectl delete pod -l control-plane=controller-manager -n kuadrant-system
kubectl wait --for=condition=Enforced=true tokenratelimitpolicy/<policy-name> -n llm --timeout=2m

Related Documentation