Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,8 @@ For detailed instructions, see the [Deployment Guide](docs/content/quickstart.md
|----------|-------------|---------|
| `MAAS_API_IMAGE` | Custom MaaS API container image (works in both operator and kustomize modes) | `quay.io/user/maas-api:pr-123` |
| `MAAS_CONTROLLER_IMAGE` | Custom MaaS controller container image | `quay.io/user/maas-controller:pr-123` |
| `METADATA_CACHE_TTL` | TTL in seconds for Authorino metadata HTTP caching | `60` (default), `300` |
| `AUTHZ_CACHE_TTL` | TTL in seconds for Authorino OPA authorization caching | `60` (default), `30` |
| `OPERATOR_CATALOG` | Custom operator catalog | `quay.io/opendatahub/catalog:pr-456` |
| `OPERATOR_IMAGE` | Custom operator image | `quay.io/opendatahub/operator:pr-456` |
| `OPERATOR_TYPE` | Operator type (rhoai/odh) | `odh` |
Expand Down Expand Up @@ -127,6 +129,7 @@ MAAS_API_IMAGE=quay.io/myuser/maas-api:pr-123 \

- [Deployment Guide](docs/content/quickstart.md) - Complete deployment instructions
- [MaaS API Documentation](maas-api/README.md) - Go API for key management
- [Authorino Caching Configuration](docs/content/configuration-and-management/authorino-caching.md) - Cache tuning for metadata and authorization

Online Documentation: [https://opendatahub-io.github.io/models-as-a-service/](https://opendatahub-io.github.io/models-as-a-service/)

Expand Down
6 changes: 6 additions & 0 deletions deployment/base/maas-controller/manager/manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ spec:
- --maas-api-namespace=$(MAAS_API_NAMESPACE)
- --maas-subscription-namespace=$(MAAS_SUBSCRIPTION_NAMESPACE)
- --cluster-audience=$(CLUSTER_AUDIENCE)
- --metadata-cache-ttl=$(METADATA_CACHE_TTL)
- --authz-cache-ttl=$(AUTHZ_CACHE_TTL)
env:
- name: GATEWAY_NAME
value: "maas-default-gateway"
Expand All @@ -46,6 +48,10 @@ spec:
value: "models-as-a-service"
- name: CLUSTER_AUDIENCE
value: "https://kubernetes.default.svc"
- name: METADATA_CACHE_TTL
value: "60"
- name: AUTHZ_CACHE_TTL
value: "60"
image: maas-controller
name: manager
imagePullPolicy: Always
Expand Down
26 changes: 26 additions & 0 deletions deployment/overlays/odh/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,32 @@ patches:
target:
kind: Deployment
name: maas-api
- patch: |-
apiVersion: apps/v1
kind: Deployment
metadata:
name: maas-controller
spec:
template:
spec:
containers:
- name: manager
env:
- name: METADATA_CACHE_TTL
valueFrom:
configMapKeyRef:
key: metadata-cache-ttl
name: maas-parameters
optional: true
- name: AUTHZ_CACHE_TTL
valueFrom:
configMapKeyRef:
key: authz-cache-ttl
name: maas-parameters
optional: true
target:
kind: Deployment
name: maas-controller

replacements:
# Gateway policies must be in openshift-ingress to target maas-default-gateway
Expand Down
2 changes: 2 additions & 0 deletions deployment/overlays/odh/params.env
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@ gateway-namespace=openshift-ingress
gateway-name=maas-default-gateway
app-namespace=opendatahub
api-key-max-expiration-days=30
metadata-cache-ttl=60
authz-cache-ttl=60
226 changes: 226 additions & 0 deletions docs/content/configuration-and-management/authorino-caching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
# Authorino Caching Configuration

This document describes the Authorino/Kuadrant caching configuration in MaaS, including how to tune cache TTLs for metadata and authorization evaluators.

---

## Overview

MaaS-generated AuthPolicy resources enable Authorino-style caching on:

- **Metadata evaluators** (HTTP calls to maas-api):
- `apiKeyValidation` - validates API keys and returns user identity + groups
- `subscription-info` - selects the appropriate subscription for the request

- **Authorization evaluators** (OPA policy evaluation):
- `auth-valid` - validates authentication (API key OR K8s token)
- `subscription-valid` - ensures a valid subscription was selected
- `require-group-membership` - checks user/group membership against allowed lists

Caching reduces load on maas-api and CPU spent on Rego re-evaluation by reusing results when the cache key repeats within the TTL window.

---

## Configuration

### Environment Variables

The maas-controller deployment supports the following environment variables to configure cache TTLs:

| Variable | Description | Default | Unit | Constraints |
|----------|-------------|---------|------|-------------|
| `METADATA_CACHE_TTL` | TTL for metadata HTTP caching (apiKeyValidation, subscription-info) | `60` | seconds | Must be ≥ 0 |
| `AUTHZ_CACHE_TTL` | TTL for OPA authorization caching (auth-valid, subscription-valid, require-group-membership) | `60` | seconds | Must be ≥ 0 |

**Note:** The controller will fail to start if either TTL is set to a negative value.

### Deployment Configuration

#### Via params.env (ODH Overlay)

Edit `deployment/overlays/odh/params.env`:

```env
metadata-cache-ttl=300 # 5 minutes
authz-cache-ttl=30 # 30 seconds
```

These values are injected into the maas-controller deployment via ConfigMap.

#### Via manager.yaml (Base Deployment)

Edit `deployment/base/maas-controller/manager/manager.yaml`:

```yaml
env:
- name: METADATA_CACHE_TTL
value: "300" # 5 minutes
- name: AUTHZ_CACHE_TTL
value: "30" # 30 seconds
```

### Important: Authorization Cache TTL Capping

**Authorization caches are automatically capped at the metadata cache TTL** to prevent stale authorization decisions.

Authorization evaluators (auth-valid, subscription-valid, require-group-membership) depend on metadata evaluators (apiKeyValidation, subscription-info). If authorization caches outlive metadata caches, stale metadata can lead to incorrect authorization decisions.

**Example:**
```yaml
METADATA_CACHE_TTL=60 # 1 minute
AUTHZ_CACHE_TTL=300 # 5 minutes (will be capped at 60 seconds)
```

In this scenario:
- Metadata caches use 60-second TTL ✅
- Authorization caches use **60-second TTL** (capped, not 300) ✅
- A warning is logged at startup: "Authorization cache TTL exceeds metadata cache TTL"

**Recommendation:** Set `AUTHZ_CACHE_TTL ≤ METADATA_CACHE_TTL` to avoid confusion.

---

## Cache Key Design

Cache keys are carefully designed to prevent data leakage between principals, subscriptions, and models.

### Collision Resistance

Cache keys use single-character delimiters (`|` and `,`) to separate components:

- **Field delimiter**: `|` separates major components (user ID, groups, subscription, model)
- **Group delimiter**: `,` joins multiple group names

**For API Keys - Collision Resistant:**
Cache keys use database-assigned UUIDs instead of usernames:
- User ID: Database primary key (UUID format in `api_keys.id` column)
- Immutable and unique per API key
- Not user-controllable (assigned by database on creation)
- Example key: `550e8400-e29b-41d4-a716-446655440000|team,admin|sub1|ns/model`
- No collision risk even if groups contain delimiters (UUID prefix ensures uniqueness)

**For Kubernetes Tokens - Already Safe:**
Kubernetes usernames follow validated format enforced by the K8s API:
- Pattern: `system:serviceaccount:namespace:sa-name`
- Kubernetes validates namespace/SA names (DNS-1123: alphanumeric + hyphens only)
- No special characters like `|` or `,` allowed in usernames
- Creating service accounts requires cluster permissions (not user self-service)

**Implementation:**
The `apiKeyValidation` metadata evaluator returns a `userId` field:
- API keys: Set to `api_keys.id` (database UUID)
- Cache keys reference `auth.metadata.apiKeyValidation.userId` in CEL expressions
- This eliminates username-based collision attacks

### Metadata Caches

**apiKeyValidation:**
- **Only runs for API key requests** (Authorization header matches `Bearer sk-oai-*`)
- Key: `<api-key-value>`
- Ensures each unique API key has its own cache entry
- Does not run for Kubernetes token requests (prevents cache pollution)
- Returns `userId` field set to database UUID (`api_keys.id`)

**subscription-info:**
- Key: `<userId>|<groups>|<requested-subscription>|<model-namespace>/<model-name>`
- For API keys: `userId` is database UUID from `apiKeyValidation` response
- For K8s tokens: `userId` is validated K8s username (`system:serviceaccount:...`)
- Groups joined with `,` delimiter
- Ensures cache isolation per user, group membership, requested subscription, and model

### Authorization Caches

**auth-valid:**
- Key: `<auth-type>|<identity>|<model-namespace>/<model-name>`
- For API keys: `api-key|<key-value>|model`
- For K8s tokens: `k8s-token|<username>|model`

**subscription-valid:**
- Key: Same as subscription-info metadata (ensures cache coherence)
- Format: `<userId>|<groups>|<requested-subscription>|<model>`
- For API keys: `userId` is database UUID. For K8s tokens: validated username.

**require-group-membership:**
- Key: `<userId>|<groups>|<model-namespace>/<model-name>`
- For API keys: `userId` is database UUID. For K8s tokens: validated username.
- Groups joined with `,` delimiter
- Ensures cache isolation per user identity and model

---

## Operational Tuning

### When to Increase Metadata Cache TTL

- **High API key validation load**: If maas-api is experiencing high load from repeated `/internal/v1/api-keys/validate` calls
- **Stable API keys**: API key metadata (username, groups) doesn't change frequently
- **Example**: Set `METADATA_CACHE_TTL=300` (5 minutes) to reduce maas-api load by 5x

### When to Decrease Authorization Cache TTL

- **Group membership changes**: If users are frequently added/removed from groups
- **Security compliance**: Shorter TTL ensures access changes propagate faster
- **Example**: Set `AUTHZ_CACHE_TTL=30` (30 seconds) for faster group membership updates

### Monitoring

After changing TTL values, monitor:
- **maas-api load**: Reduced `/internal/v1/api-keys/validate` and `/internal/v1/subscriptions/select` call rates
- **Authorino CPU**: Reduced OPA evaluation CPU usage
- **Request latency**: Cache hits should have lower P99 latency

---

## Security Notes

### Cache Key Correctness

All cache keys include sufficient dimensions to prevent cross-principal or cross-subscription cache sharing:

- **Never share cache entries between different users**
- **Never share cache entries between different API keys**
- **Never share cache entries between different models** (model namespace/name in key)
- **Never share cache entries between different group memberships** (groups in key)

### Cache Key Collision Risk

**API Keys - No Collision Risk:**
Cache keys use database-assigned UUIDs instead of usernames:
- User IDs are unique 128-bit UUIDs (format: `550e8400-e29b-41d4-a716-446655440000`)
- Immutable and assigned by PostgreSQL at API key creation
- Not user-controllable (no self-service user ID selection)
- Even if groups contain delimiters (`,` or `|`), the UUID prefix prevents collision
- Example: Two users with groups `["team,admin"]` and `["team", "admin"]` have different UUIDs, so no collision

**Kubernetes Tokens - No Collision Risk:**
Kubernetes usernames are validated by the K8s API server:
- Format: `system:serviceaccount:namespace:sa-name`
- Kubernetes enforces DNS-1123 naming: `[a-z0-9]([-a-z0-9]*[a-z0-9])?`
- No special characters like `|` or `,` allowed
- Creating service accounts requires cluster RBAC permissions (not user self-service)

**Remaining Edge Case - Group Ordering:**
Group array ordering affects cache keys:
- `["admin", "user"]` produces different key than `["user", "admin"]`
- CEL has no array sort() function
- Impact: Suboptimal cache hit rate if group order varies between OIDC token refreshes
- Mitigation: OIDC providers and K8s TokenReview typically return groups in consistent order

### Stale Data Window

Cache TTL represents the maximum staleness window:

- **Metadata caches**: API key revocation or group membership changes may take up to `METADATA_CACHE_TTL` seconds to propagate
- **Authorization caches**: Authorization policy changes may take up to `AUTHZ_CACHE_TTL` seconds to propagate

For immediate policy enforcement after changes:
1. Delete the affected AuthPolicy to clear Authorino's cache
2. Or wait for the TTL to expire

---

## References

- [Authorino Caching User Guide](https://docs.kuadrant.io/latest/authorino/docs/features/#caching)
- [AuthPolicy Reference](https://docs.kuadrant.io/latest/kuadrant-operator/doc/reference/authpolicy/)
- [MaaS Controller Overview](./maas-controller-overview.md)
2 changes: 1 addition & 1 deletion maas-api/internal/api_keys/service.go
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ func (s *Service) ValidateAPIKey(ctx context.Context, key string) (*ValidationRe
// Success - return user identity and groups for Authorino
return &ValidationResult{
Valid: true,
UserID: metadata.Username,
UserID: metadata.ID, // Database-assigned UUID (immutable, collision-resistant)
Username: metadata.Username,
KeyID: metadata.ID,
Groups: groups, // Original user groups for subscription-based authorization
Expand Down
4 changes: 2 additions & 2 deletions maas-api/internal/api_keys/service_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ func TestValidateAPIKey_ValidKey(t *testing.T) {
require.NotNil(t, result)

assert.True(t, result.Valid)
assert.Equal(t, username, result.UserID)
assert.Equal(t, keyID, result.UserID) // UserID is the database-assigned key ID (UUID)
assert.Equal(t, username, result.Username)
assert.Equal(t, keyID, result.KeyID)
assert.Equal(t, groups, result.Groups)
Expand Down Expand Up @@ -170,7 +170,7 @@ func TestValidateAPIKey_EmptyGroups(t *testing.T) {
require.NotNil(t, result)

assert.True(t, result.Valid)
assert.Equal(t, username, result.UserID)
assert.Equal(t, keyID, result.UserID) // UserID is the database-assigned key ID (UUID)
assert.NotNil(t, result.Groups, "Groups should be empty array, not nil")
assert.Empty(t, result.Groups, "Groups should be empty")
}
Expand Down
16 changes: 11 additions & 5 deletions maas-controller/cmd/manager/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,8 @@ func main() {
var maasAPINamespace string
var maasSubscriptionNamespace string
var clusterAudience string
var metadataCacheTTL int64
var authzCacheTTL int64

flag.StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metrics endpoint binds to.")
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
Expand All @@ -141,6 +143,8 @@ func main() {
flag.StringVar(&maasAPINamespace, "maas-api-namespace", "opendatahub", "The namespace where maas-api service is deployed.")
flag.StringVar(&maasSubscriptionNamespace, "maas-subscription-namespace", "models-as-a-service", "The namespace to watch for MaaS CRs.")
flag.StringVar(&clusterAudience, "cluster-audience", "https://kubernetes.default.svc", "The OIDC audience of the cluster for TokenReview. HyperShift/ROSA clusters use a custom OIDC provider URL.")
flag.Int64Var(&metadataCacheTTL, "metadata-cache-ttl", 60, "TTL in seconds for Authorino metadata HTTP caching (apiKeyValidation, subscription-info).")
flag.Int64Var(&authzCacheTTL, "authz-cache-ttl", 60, "TTL in seconds for Authorino OPA authorization caching (auth-valid, subscription-valid, require-group-membership).")

opts := zap.Options{Development: false}
opts.BindFlags(flag.CommandLine)
Expand Down Expand Up @@ -196,11 +200,13 @@ func main() {
os.Exit(1)
}
if err := (&maas.MaaSAuthPolicyReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
MaaSAPINamespace: maasAPINamespace,
GatewayName: gatewayName,
ClusterAudience: clusterAudience,
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
MaaSAPINamespace: maasAPINamespace,
GatewayName: gatewayName,
ClusterAudience: clusterAudience,
MetadataCacheTTL: metadataCacheTTL,
AuthzCacheTTL: authzCacheTTL,
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "MaaSAuthPolicy")
os.Exit(1)
Expand Down
Loading
Loading