|
| 1 | +# Authorino Caching Configuration |
| 2 | + |
| 3 | +This document describes the Authorino/Kuadrant caching configuration in MaaS, including how to tune cache TTLs for metadata and authorization evaluators. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Overview |
| 8 | + |
| 9 | +MaaS-generated AuthPolicy resources enable Authorino-style caching on: |
| 10 | + |
| 11 | +- **Metadata evaluators** (HTTP calls to maas-api): |
| 12 | + - `apiKeyValidation` - validates API keys and returns user identity + groups |
| 13 | + - `subscription-info` - selects the appropriate subscription for the request |
| 14 | + |
| 15 | +- **Authorization evaluators** (OPA policy evaluation): |
| 16 | + - `auth-valid` - validates authentication (API key OR K8s token) |
| 17 | + - `subscription-valid` - ensures a valid subscription was selected |
| 18 | + - `require-group-membership` - checks user/group membership against allowed lists |
| 19 | + |
| 20 | +Caching reduces load on maas-api and CPU spent on Rego re-evaluation by reusing results when the cache key repeats within the TTL window. |
| 21 | + |
| 22 | +--- |
| 23 | + |
| 24 | +## Configuration |
| 25 | + |
| 26 | +### Environment Variables |
| 27 | + |
| 28 | +The maas-controller deployment supports the following environment variables to configure cache TTLs: |
| 29 | + |
| 30 | +| Variable | Description | Default | Unit | Constraints | |
| 31 | +|----------|-------------|---------|------|-------------| |
| 32 | +| `METADATA_CACHE_TTL` | TTL for metadata HTTP caching (apiKeyValidation, subscription-info) | `60` | seconds | Must be ≥ 0 | |
| 33 | +| `AUTHZ_CACHE_TTL` | TTL for OPA authorization caching (auth-valid, subscription-valid, require-group-membership) | `60` | seconds | Must be ≥ 0 | |
| 34 | + |
| 35 | +**Note:** The controller will fail to start if either TTL is set to a negative value. |
| 36 | + |
| 37 | +### Deployment Configuration |
| 38 | + |
| 39 | +#### Via params.env (ODH Overlay) |
| 40 | + |
| 41 | +Edit `deployment/overlays/odh/params.env`: |
| 42 | + |
| 43 | +```env |
| 44 | +metadata-cache-ttl=300 # 5 minutes |
| 45 | +authz-cache-ttl=30 # 30 seconds |
| 46 | +``` |
| 47 | + |
| 48 | +These values are injected into the maas-controller deployment via ConfigMap. |
| 49 | + |
| 50 | +#### Via manager.yaml (Base Deployment) |
| 51 | + |
| 52 | +Edit `deployment/base/maas-controller/manager/manager.yaml`: |
| 53 | + |
| 54 | +```yaml |
| 55 | +env: |
| 56 | + - name: METADATA_CACHE_TTL |
| 57 | + value: "300" # 5 minutes |
| 58 | + - name: AUTHZ_CACHE_TTL |
| 59 | + value: "30" # 30 seconds |
| 60 | +``` |
| 61 | +
|
| 62 | +### Important: Authorization Cache TTL Capping |
| 63 | +
|
| 64 | +**Authorization caches are automatically capped at the metadata cache TTL** to prevent stale authorization decisions. |
| 65 | +
|
| 66 | +Authorization evaluators (auth-valid, subscription-valid, require-group-membership) depend on metadata evaluators (apiKeyValidation, subscription-info). If authorization caches outlive metadata caches, stale metadata can lead to incorrect authorization decisions. |
| 67 | +
|
| 68 | +**Example:** |
| 69 | +```yaml |
| 70 | +METADATA_CACHE_TTL=60 # 1 minute |
| 71 | +AUTHZ_CACHE_TTL=300 # 5 minutes (will be capped at 60 seconds) |
| 72 | +``` |
| 73 | + |
| 74 | +In this scenario: |
| 75 | +- Metadata caches use 60-second TTL ✅ |
| 76 | +- Authorization caches use **60-second TTL** (capped, not 300) ✅ |
| 77 | +- A warning is logged at startup: "Authorization cache TTL exceeds metadata cache TTL" |
| 78 | + |
| 79 | +**Recommendation:** Set `AUTHZ_CACHE_TTL ≤ METADATA_CACHE_TTL` to avoid confusion. |
| 80 | + |
| 81 | +--- |
| 82 | + |
| 83 | +## Cache Key Design |
| 84 | + |
| 85 | +Cache keys are carefully designed to prevent data leakage between principals, subscriptions, and models. |
| 86 | + |
| 87 | +### Collision Resistance |
| 88 | + |
| 89 | +Cache keys use single-character delimiters (`|` and `,`) to separate components: |
| 90 | + |
| 91 | +- **Field delimiter**: `|` separates major components (user ID, groups, subscription, model) |
| 92 | +- **Group delimiter**: `,` joins multiple group names |
| 93 | + |
| 94 | +**For API Keys - Collision Resistant:** |
| 95 | +Cache keys use database-assigned UUIDs instead of usernames: |
| 96 | +- User ID: Database primary key (UUID format in `api_keys.id` column) |
| 97 | +- Immutable and unique per API key |
| 98 | +- Not user-controllable (assigned by database on creation) |
| 99 | +- Example key: `550e8400-e29b-41d4-a716-446655440000|team,admin|sub1|ns/model` |
| 100 | +- No collision risk even if groups contain delimiters (UUID prefix ensures uniqueness) |
| 101 | + |
| 102 | +**For Kubernetes Tokens - Already Safe:** |
| 103 | +Kubernetes usernames follow validated format enforced by the K8s API: |
| 104 | +- Pattern: `system:serviceaccount:namespace:sa-name` |
| 105 | +- Kubernetes validates namespace/SA names (DNS-1123: alphanumeric + hyphens only) |
| 106 | +- No special characters like `|` or `,` allowed in usernames |
| 107 | +- Creating service accounts requires cluster permissions (not user self-service) |
| 108 | + |
| 109 | +**Implementation:** |
| 110 | +The `apiKeyValidation` metadata evaluator returns a `userId` field: |
| 111 | +- API keys: Set to `api_keys.id` (database UUID) |
| 112 | +- Cache keys reference `auth.metadata.apiKeyValidation.userId` in CEL expressions |
| 113 | +- This eliminates username-based collision attacks |
| 114 | + |
| 115 | +### Metadata Caches |
| 116 | + |
| 117 | +**apiKeyValidation:** |
| 118 | +- **Only runs for API key requests** (Authorization header matches `Bearer sk-oai-*`) |
| 119 | +- Key: `<api-key-value>` |
| 120 | +- Ensures each unique API key has its own cache entry |
| 121 | +- Does not run for Kubernetes token requests (prevents cache pollution) |
| 122 | +- Returns `userId` field set to database UUID (`api_keys.id`) |
| 123 | + |
| 124 | +**subscription-info:** |
| 125 | +- Key: `<userId>|<groups>|<requested-subscription>|<model-namespace>/<model-name>` |
| 126 | +- For API keys: `userId` is database UUID from `apiKeyValidation` response |
| 127 | +- For K8s tokens: `userId` is validated K8s username (`system:serviceaccount:...`) |
| 128 | +- Groups joined with `,` delimiter |
| 129 | +- Ensures cache isolation per user, group membership, requested subscription, and model |
| 130 | + |
| 131 | +### Authorization Caches |
| 132 | + |
| 133 | +**auth-valid:** |
| 134 | +- Key: `<auth-type>|<identity>|<model-namespace>/<model-name>` |
| 135 | +- For API keys: `api-key|<key-value>|model` |
| 136 | +- For K8s tokens: `k8s-token|<username>|model` |
| 137 | + |
| 138 | +**subscription-valid:** |
| 139 | +- Key: Same as subscription-info metadata (ensures cache coherence) |
| 140 | +- Format: `<userId>|<groups>|<requested-subscription>|<model>` |
| 141 | +- For API keys: `userId` is database UUID. For K8s tokens: validated username. |
| 142 | + |
| 143 | +**require-group-membership:** |
| 144 | +- Key: `<userId>|<groups>|<model-namespace>/<model-name>` |
| 145 | +- For API keys: `userId` is database UUID. For K8s tokens: validated username. |
| 146 | +- Groups joined with `,` delimiter |
| 147 | +- Ensures cache isolation per user identity and model |
| 148 | + |
| 149 | +--- |
| 150 | + |
| 151 | +## Operational Tuning |
| 152 | + |
| 153 | +### When to Increase Metadata Cache TTL |
| 154 | + |
| 155 | +- **High API key validation load**: If maas-api is experiencing high load from repeated `/internal/v1/api-keys/validate` calls |
| 156 | +- **Stable API keys**: API key metadata (username, groups) doesn't change frequently |
| 157 | +- **Example**: Set `METADATA_CACHE_TTL=300` (5 minutes) to reduce maas-api load by 5x |
| 158 | + |
| 159 | +### When to Decrease Authorization Cache TTL |
| 160 | + |
| 161 | +- **Group membership changes**: If users are frequently added/removed from groups |
| 162 | +- **Security compliance**: Shorter TTL ensures access changes propagate faster |
| 163 | +- **Example**: Set `AUTHZ_CACHE_TTL=30` (30 seconds) for faster group membership updates |
| 164 | + |
| 165 | +### Monitoring |
| 166 | + |
| 167 | +After changing TTL values, monitor: |
| 168 | +- **maas-api load**: Reduced `/internal/v1/api-keys/validate` and `/internal/v1/subscriptions/select` call rates |
| 169 | +- **Authorino CPU**: Reduced OPA evaluation CPU usage |
| 170 | +- **Request latency**: Cache hits should have lower P99 latency |
| 171 | + |
| 172 | +--- |
| 173 | + |
| 174 | +## Security Notes |
| 175 | + |
| 176 | +### Cache Key Correctness |
| 177 | + |
| 178 | +All cache keys include sufficient dimensions to prevent cross-principal or cross-subscription cache sharing: |
| 179 | + |
| 180 | +- **Never share cache entries between different users** |
| 181 | +- **Never share cache entries between different API keys** |
| 182 | +- **Never share cache entries between different models** (model namespace/name in key) |
| 183 | +- **Never share cache entries between different group memberships** (groups in key) |
| 184 | + |
| 185 | +### Cache Key Collision Risk |
| 186 | + |
| 187 | +**API Keys - No Collision Risk:** |
| 188 | +Cache keys use database-assigned UUIDs instead of usernames: |
| 189 | +- User IDs are unique 128-bit UUIDs (format: `550e8400-e29b-41d4-a716-446655440000`) |
| 190 | +- Immutable and assigned by PostgreSQL at API key creation |
| 191 | +- Not user-controllable (no self-service user ID selection) |
| 192 | +- Even if groups contain delimiters (`,` or `|`), the UUID prefix prevents collision |
| 193 | +- Example: Two users with groups `["team,admin"]` and `["team", "admin"]` have different UUIDs, so no collision |
| 194 | + |
| 195 | +**Kubernetes Tokens - No Collision Risk:** |
| 196 | +Kubernetes usernames are validated by the K8s API server: |
| 197 | +- Format: `system:serviceaccount:namespace:sa-name` |
| 198 | +- Kubernetes enforces DNS-1123 naming: `[a-z0-9]([-a-z0-9]*[a-z0-9])?` |
| 199 | +- No special characters like `|` or `,` allowed |
| 200 | +- Creating service accounts requires cluster RBAC permissions (not user self-service) |
| 201 | + |
| 202 | +**Remaining Edge Case - Group Ordering:** |
| 203 | +Group array ordering affects cache keys: |
| 204 | +- `["admin", "user"]` produces different key than `["user", "admin"]` |
| 205 | +- CEL has no array sort() function |
| 206 | +- Impact: Suboptimal cache hit rate if group order varies between OIDC token refreshes |
| 207 | +- Mitigation: OIDC providers and K8s TokenReview typically return groups in consistent order |
| 208 | + |
| 209 | +### Stale Data Window |
| 210 | + |
| 211 | +Cache TTL represents the maximum staleness window: |
| 212 | + |
| 213 | +- **Metadata caches**: API key revocation or group membership changes may take up to `METADATA_CACHE_TTL` seconds to propagate |
| 214 | +- **Authorization caches**: Authorization policy changes may take up to `AUTHZ_CACHE_TTL` seconds to propagate |
| 215 | + |
| 216 | +For immediate policy enforcement after changes: |
| 217 | +1. Delete the affected AuthPolicy to clear Authorino's cache |
| 218 | +2. Or wait for the TTL to expire |
| 219 | + |
| 220 | +--- |
| 221 | + |
| 222 | +## References |
| 223 | + |
| 224 | +- [Authorino Caching User Guide](https://docs.kuadrant.io/latest/authorino/docs/features/#caching) |
| 225 | +- [AuthPolicy Reference](https://docs.kuadrant.io/latest/kuadrant-operator/doc/reference/authpolicy/) |
| 226 | +- [MaaS Controller Overview](./maas-controller-overview.md) |
0 commit comments