Skip to content
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,8 @@ For detailed instructions, see the [Deployment Guide](docs/content/quickstart.md
|----------|-------------|---------|
| `MAAS_API_IMAGE` | Custom MaaS API container image (works in both operator and kustomize modes) | `quay.io/user/maas-api:pr-123` |
| `MAAS_CONTROLLER_IMAGE` | Custom MaaS controller container image | `quay.io/user/maas-controller:pr-123` |
| `METADATA_CACHE_TTL` | TTL in seconds for Authorino metadata HTTP caching | `60` (default), `300` |
| `AUTHZ_CACHE_TTL` | TTL in seconds for Authorino OPA authorization caching | `60` (default), `30` |
| `OPERATOR_CATALOG` | Custom operator catalog | `quay.io/opendatahub/catalog:pr-456` |
| `OPERATOR_IMAGE` | Custom operator image | `quay.io/opendatahub/operator:pr-456` |
| `OPERATOR_TYPE` | Operator type (rhoai/odh) | `odh` |
Expand Down Expand Up @@ -127,6 +129,7 @@ MAAS_API_IMAGE=quay.io/myuser/maas-api:pr-123 \

- [Deployment Guide](docs/content/quickstart.md) - Complete deployment instructions
- [MaaS API Documentation](maas-api/README.md) - Go API for key management
- [Authorino Caching Configuration](docs/content/configuration-and-management/authorino-caching.md) - Cache tuning for metadata and authorization

Online Documentation: [https://opendatahub-io.github.io/models-as-a-service/](https://opendatahub-io.github.io/models-as-a-service/)

Expand Down
2 changes: 1 addition & 1 deletion deployment/base/maas-api/policies/auth-policy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ spec:
when:
- predicate: request.headers.authorization.startsWith("Bearer sk-oai-")
plain:
expression: '''["'' + auth.metadata.apiKeyValidation.groups.join(''","'') + ''"]'''
selector: auth.metadata.apiKeyValidation.groups.@tostr
priority: 0
# Groups: from OpenShift identity as JSON array (when OC token used)
X-MaaS-Group-OC:
Expand Down
6 changes: 6 additions & 0 deletions deployment/base/maas-controller/manager/manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ spec:
- --maas-api-namespace=$(MAAS_API_NAMESPACE)
- --maas-subscription-namespace=$(MAAS_SUBSCRIPTION_NAMESPACE)
- --cluster-audience=$(CLUSTER_AUDIENCE)
- --metadata-cache-ttl=$(METADATA_CACHE_TTL)
- --authz-cache-ttl=$(AUTHZ_CACHE_TTL)
env:
- name: GATEWAY_NAME
value: "maas-default-gateway"
Expand All @@ -46,6 +48,10 @@ spec:
value: "models-as-a-service"
- name: CLUSTER_AUDIENCE
value: "https://kubernetes.default.svc"
- name: METADATA_CACHE_TTL
value: "60"
- name: AUTHZ_CACHE_TTL
value: "60"
image: maas-controller
name: manager
imagePullPolicy: Always
Expand Down
26 changes: 26 additions & 0 deletions deployment/overlays/odh/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,32 @@ patches:
target:
kind: Deployment
name: maas-api
- patch: |-
apiVersion: apps/v1
kind: Deployment
metadata:
name: maas-controller
spec:
template:
spec:
containers:
- name: manager
env:
- name: METADATA_CACHE_TTL
valueFrom:
configMapKeyRef:
key: metadata-cache-ttl
name: maas-parameters
optional: true
- name: AUTHZ_CACHE_TTL
valueFrom:
configMapKeyRef:
key: authz-cache-ttl
name: maas-parameters
optional: true
target:
kind: Deployment
name: maas-controller

replacements:
# Gateway policies must be in openshift-ingress to target maas-default-gateway
Expand Down
2 changes: 2 additions & 0 deletions deployment/overlays/odh/params.env
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@ gateway-namespace=openshift-ingress
gateway-name=maas-default-gateway
app-namespace=opendatahub
api-key-max-expiration-days=30
metadata-cache-ttl=60
authz-cache-ttl=60
226 changes: 226 additions & 0 deletions docs/content/configuration-and-management/authorino-caching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
# Authorino Caching Configuration

This document describes the Authorino/Kuadrant caching configuration in MaaS, including how to tune cache TTLs for metadata and authorization evaluators.

---

## Overview

MaaS-generated AuthPolicy resources enable Authorino-style caching on:

- **Metadata evaluators** (HTTP calls to maas-api):
- `apiKeyValidation` - validates API keys and returns user identity + groups
- `subscription-info` - selects the appropriate subscription for the request

- **Authorization evaluators** (OPA policy evaluation):
- `auth-valid` - validates authentication (API key OR K8s token)
- `subscription-valid` - ensures a valid subscription was selected
- `require-group-membership` - checks user/group membership against allowed lists

Caching reduces load on maas-api and CPU spent on Rego re-evaluation by reusing results when the cache key repeats within the TTL window.

---

## Configuration

### Environment Variables

The maas-controller deployment supports the following environment variables to configure cache TTLs:

| Variable | Description | Default | Unit | Constraints |
|----------|-------------|---------|------|-------------|
| `METADATA_CACHE_TTL` | TTL for metadata HTTP caching (apiKeyValidation, subscription-info) | `60` | seconds | Must be ≥ 0 |
| `AUTHZ_CACHE_TTL` | TTL for OPA authorization caching (auth-valid, subscription-valid, require-group-membership) | `60` | seconds | Must be ≥ 0 |

**Note:** The controller will fail to start if either TTL is set to a negative value.

### Deployment Configuration

#### Via params.env (ODH Overlay)

Edit `deployment/overlays/odh/params.env`:

```env
metadata-cache-ttl=300 # 5 minutes
authz-cache-ttl=30 # 30 seconds
```

These values are injected into the maas-controller deployment via ConfigMap.

#### Via manager.yaml (Base Deployment)

Edit `deployment/base/maas-controller/manager/manager.yaml`:

```yaml
env:
- name: METADATA_CACHE_TTL
value: "300" # 5 minutes
- name: AUTHZ_CACHE_TTL
value: "30" # 30 seconds
```

### Important: Authorization Cache TTL Capping

**Authorization caches are automatically capped at the metadata cache TTL** to prevent stale authorization decisions.

Authorization evaluators (auth-valid, subscription-valid, require-group-membership) depend on metadata evaluators (apiKeyValidation, subscription-info). If authorization caches outlive metadata caches, stale metadata can lead to incorrect authorization decisions.

**Example:**
```yaml
METADATA_CACHE_TTL=60 # 1 minute
AUTHZ_CACHE_TTL=300 # 5 minutes (will be capped at 60 seconds)
```

In this scenario:
- Metadata caches use 60-second TTL ✅
- Authorization caches use **60-second TTL** (capped, not 300) ✅
- A warning is logged at startup: "Authorization cache TTL exceeds metadata cache TTL"

**Recommendation:** Set `AUTHZ_CACHE_TTL ≤ METADATA_CACHE_TTL` to avoid confusion.

---

## Cache Key Design

Cache keys are carefully designed to prevent data leakage between principals, subscriptions, and models.

### Collision Resistance

Cache keys use single-character delimiters (`|` and `,`) to separate components:

- **Field delimiter**: `|` separates major components (user ID, groups, subscription, model)
- **Group delimiter**: `,` joins multiple group names

**For API Keys - Collision Resistant:**
Cache keys use database-assigned UUIDs instead of usernames:
- User ID: Database primary key (UUID format in `api_keys.id` column)
- Immutable and unique per API key
- Not user-controllable (assigned by database on creation)
- Example key: `550e8400-e29b-41d4-a716-446655440000|team,admin|sub1|ns/model`
- No collision risk even if groups contain delimiters (UUID prefix ensures uniqueness)

**For Kubernetes Tokens - Already Safe:**
Kubernetes usernames follow validated format enforced by the K8s API:
- Pattern: `system:serviceaccount:namespace:sa-name`
- Kubernetes validates namespace/SA names (DNS-1123: alphanumeric + hyphens only)
- No special characters like `|` or `,` allowed in usernames
- Creating service accounts requires cluster permissions (not user self-service)

**Implementation:**
The `apiKeyValidation` metadata evaluator returns a `userId` field:
- API keys: Set to `api_keys.id` (database UUID)
- Cache keys reference `auth.metadata.apiKeyValidation.userId` in CEL expressions
- This eliminates username-based collision attacks

### Metadata Caches

**apiKeyValidation:**
- **Only runs for API key requests** (Authorization header matches `Bearer sk-oai-*`)
- Key: `<api-key-value>`
- Ensures each unique API key has its own cache entry
- Does not run for Kubernetes token requests (prevents cache pollution)
- Returns `userId` field set to database UUID (`api_keys.id`)

**subscription-info:**
- Key: `<userId>|<groups>|<requested-subscription>|<model-namespace>/<model-name>`
- For API keys: `userId` is database UUID from `apiKeyValidation` response
- For K8s tokens: `userId` is validated K8s username (`system:serviceaccount:...`)
- Groups joined with `,` delimiter
- Ensures cache isolation per user, group membership, requested subscription, and model

### Authorization Caches

**auth-valid:**
- Key: `<auth-type>|<identity>|<model-namespace>/<model-name>`
- For API keys: `api-key|<key-value>|model`
- For K8s tokens: `k8s-token|<username>|model`

**subscription-valid:**
- Key: Same as subscription-info metadata (ensures cache coherence)
- Format: `<userId>|<groups>|<requested-subscription>|<model>`
- For API keys: `userId` is database UUID. For K8s tokens: validated username.

**require-group-membership:**
- Key: `<userId>|<groups>|<model-namespace>/<model-name>`
- For API keys: `userId` is database UUID. For K8s tokens: validated username.
- Groups joined with `,` delimiter
- Ensures cache isolation per user identity and model

---

## Operational Tuning

### When to Increase Metadata Cache TTL

- **High API key validation load**: If maas-api is experiencing high load from repeated `/internal/v1/api-keys/validate` calls
- **Stable API keys**: API key metadata (username, groups) doesn't change frequently
- **Example**: Set `METADATA_CACHE_TTL=300` (5 minutes) to reduce maas-api load by 5x

### When to Decrease Authorization Cache TTL

- **Group membership changes**: If users are frequently added/removed from groups
- **Security compliance**: Shorter TTL ensures access changes propagate faster
- **Example**: Set `AUTHZ_CACHE_TTL=30` (30 seconds) for faster group membership updates

### Monitoring

After changing TTL values, monitor:
- **maas-api load**: Reduced `/internal/v1/api-keys/validate` and `/internal/v1/subscriptions/select` call rates
- **Authorino CPU**: Reduced OPA evaluation CPU usage
- **Request latency**: Cache hits should have lower P99 latency

---

## Security Notes

### Cache Key Correctness

All cache keys include sufficient dimensions to prevent cross-principal or cross-subscription cache sharing:

- **Never share cache entries between different users**
- **Never share cache entries between different API keys**
- **Never share cache entries between different models** (model namespace/name in key)
- **Never share cache entries between different group memberships** (groups in key)

### Cache Key Collision Risk

**API Keys - No Collision Risk:**
Cache keys use database-assigned UUIDs instead of usernames:
- User IDs are unique 128-bit UUIDs (format: `550e8400-e29b-41d4-a716-446655440000`)
- Immutable and assigned by PostgreSQL at API key creation
- Not user-controllable (no self-service user ID selection)
- Even if groups contain delimiters (`,` or `|`), the UUID prefix prevents collision
- Example: Two users with groups `["team,admin"]` and `["team", "admin"]` have different UUIDs, so no collision

**Kubernetes Tokens - No Collision Risk:**
Kubernetes usernames are validated by the K8s API server:
- Format: `system:serviceaccount:namespace:sa-name`
- Kubernetes enforces DNS-1123 naming: `[a-z0-9]([-a-z0-9]*[a-z0-9])?`
- No special characters like `|` or `,` allowed
- Creating service accounts requires cluster RBAC permissions (not user self-service)

**Remaining Edge Case - Group Ordering:**
Group array ordering affects cache keys:
- `["admin", "user"]` produces different key than `["user", "admin"]`
- CEL has no array sort() function
- Impact: Suboptimal cache hit rate if group order varies between OIDC token refreshes
- Mitigation: OIDC providers and K8s TokenReview typically return groups in consistent order

### Stale Data Window

Cache TTL represents the maximum staleness window:

- **Metadata caches**: API key revocation or group membership changes may take up to `METADATA_CACHE_TTL` seconds to propagate
- **Authorization caches**: Authorization policy changes may take up to `AUTHZ_CACHE_TTL` seconds to propagate

For immediate policy enforcement after changes:
1. Delete the affected AuthPolicy to clear Authorino's cache
2. Or wait for the TTL to expire

---

## References

- [Authorino Caching User Guide](https://docs.kuadrant.io/latest/authorino/docs/features/#caching)
- [AuthPolicy Reference](https://docs.kuadrant.io/latest/kuadrant-operator/doc/reference/authpolicy/)
- [MaaS Controller Overview](./maas-controller-overview.md)
10 changes: 9 additions & 1 deletion maas-api/internal/api_keys/service.go
Original file line number Diff line number Diff line change
Expand Up @@ -251,10 +251,18 @@ func (s *Service) ValidateAPIKey(ctx context.Context, key string) (*ValidationRe
}, nil
}

// Validate metadata.ID is a syntactically valid UUID (fail-closed defense-in-depth)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little confused as to why changes are being made in MaaS API at all.

The cacheing should be purely through Kudarant so on the AuthPolicies right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Rabbit was concerned about how the cache key was originally created - using the user name - which could be manipulated to gain unauthorized access. A mitigation was to use the user id , but that was also the user name. Changing it to a uuid let to these maas-api changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai, do you want to elaborate?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrhyness — happy to elaborate for jland-redhat's benefit.

Why maas-api changes are needed even though caching lives in Kuadrant

The Authorino apiKeyValidation metadata evaluator calls the maas-api /api-keys/validate endpoint and stores the JSON response. Fields from that response are then used by downstream Authorino evaluators (auth-valid, subscription-valid, require-group-membership) — both in their CEL expressions and in their cache.key.selector strings.

The chain looks like this:

Bearer sk-oai-… → Authorino apiKeyValidation → HTTP call → maas-api /api-keys/validate
                                                           ↓
                                             ValidationResult JSON (UserID, Username, Groups…)
                                                           ↓
                               Authorino evaluator cache key selectors consume these fields

Original security problem (CWE-706 / cache key collision)

If UserID was set to metadata.Username — a value that is chosen by the user at API-key creation time — an attacker could register a username that collides with another user's identity string. Because Authorino's cache is keyed on that value, a crafted username could cause a cache hit against a different user's cached authorization result, potentially returning a valid authorization decision for a model or subscription the attacker shouldn't have access to.

The fix

Changing UserID to metadata.ID (the database-assigned UUID) breaks that attack:

Property metadata.Username metadata.ID (UUID)
User-controlled ✅ Yes ❌ No
Unique guarantee Weak (app-level) Strong (DB uuid_generate_v4())
Collision-resistant No Yes (2¹²² space)
Safe in cache key selector No Yes

The UUID validation (uuid.Parse) added in the same diff is defense-in-depth (CWE-20): it ensures that even if a database integrity issue somehow produced a malformed ID, a non-UUID string would never silently propagate into Authorino's cache key selectors where it could cause unpredictable key collisions or CEL expression errors.

Summary for jland-redhat

The caching mechanism is purely Kuadrant/Authorino, but the data those cache keys are built from comes from the maas-api validation response. Making the cache keys secure required making the source data (the UserID field) tamper-resistant — which required the small change in maas-api.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting...

Not an edge case I would have thought of

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

// Database should always return valid UUIDs, but verify to prevent malformed IDs
// from being used in cache keys or authorization decisions
if _, err := uuid.Parse(metadata.ID); err != nil {
s.logger.Error("API key has invalid UUID format", "key_id", metadata.ID, "error", err)
return nil, fmt.Errorf("database integrity error: invalid key ID format: %w", err)
}

// Success - return user identity and groups for Authorino
return &ValidationResult{
Valid: true,
UserID: metadata.Username,
UserID: metadata.ID, // Database-assigned UUID (immutable, collision-resistant)
Username: metadata.Username,
KeyID: metadata.ID,
Groups: groups, // Original user groups for subscription-based authorization
Expand Down
Loading
Loading