Skip to content

Commit e5a738b

Browse files
Merge branch 'opendatahub-io:main' into somya-fix-overlays
Integrated latest changes from main including: - Authorino caching configuration (metadata and authz cache TTL) - Enhanced API key service with cache support - Controller improvements for AuthPolicy caching Resolved conflict in deployment/overlays/odh/kustomization.yaml by: - Keeping component-based approach (shared-patches component) - Adding ODH-specific cache TTL patches for maas-controller - Preserving existing ODH-specific replacements for gateway policies The component architecture remains intact while incorporating the new caching functionality from main. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 parents 20a8850 + 4f06bcf commit e5a738b

File tree

17 files changed

+1365
-111
lines changed

17 files changed

+1365
-111
lines changed

.tekton/odh-maas-api-pull-request.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ metadata:
99
pipelinesascode.tekton.dev/cancel-in-progress: "false"
1010
pipelinesascode.tekton.dev/max-keep-runs: "3"
1111
pipelinesascode.tekton.dev/on-cel-expression: event == "pull_request" && target_branch
12-
== "main"
12+
== "main" && !files.all.all(x, x.matches('^docs/') || x.matches('\\.md$'))
1313
creationTimestamp: null
1414
labels:
1515
appstudio.openshift.io/application: opendatahub-builds

.tekton/odh-maas-controller-pull-request.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ metadata:
99
pipelinesascode.tekton.dev/cancel-in-progress: "false"
1010
pipelinesascode.tekton.dev/max-keep-runs: "3"
1111
pipelinesascode.tekton.dev/on-cel-expression: event == "pull_request" && target_branch
12-
== "main"
12+
== "main" && !files.all.all(x, x.matches('^docs/') || x.matches('\\.md$'))
1313
creationTimestamp: null
1414
labels:
1515
appstudio.openshift.io/application: opendatahub-builds

CONTRIBUTING.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,10 +61,37 @@ This project follows a **Stream-Lake-Ocean** release model. Code flows from acti
6161
| `maas-controller/` | Kubernetes controller for MaaS CRDs; see [maas-controller/README.md](maas-controller/README.md) |
6262
| `docs/` | User and admin documentation (MkDocs); [online docs](https://opendatahub-io.github.io/models-as-a-service/) |
6363
| `test/` | E2E and billing/smoke tests |
64-
| `.github/workflows/` | CI (build, PR title validation, MaaS API lint/build) |
64+
| `.github/workflows/` | GitHub Actions CI (lint, build, PR title validation, docs) |
65+
| `.tekton/` | Konflux/Tekton pipeline definitions for container image builds |
6566

6667
## CI and checks
6768

69+
This project uses two CI systems: **Konflux** (Tekton-based) for container image builds and integration testing, and **GitHub Actions** for linting, unit tests, and documentation.
70+
71+
### Konflux / Tekton pipelines
72+
73+
Konflux builds multi-arch container images (x86_64, arm64, ppc64le, s390x) for both `maas-api` and `maas-controller` on every PR and push to `main`. Pipeline definitions live in `.tekton/` and reference a shared pipeline from [odh-konflux-central](https://github.com/opendatahub-io/odh-konflux-central) (`pipeline/multi-arch-container-build.yaml`).
74+
75+
| Pipeline | Trigger | Output image |
76+
|----------|---------|--------------|
77+
| `odh-maas-api-on-pull-request` | PR to `main` | `quay.io/opendatahub/maas-api:odh-pr` |
78+
| `odh-maas-api-on-push` | Push to `main` | `quay.io/opendatahub/maas-api:odh-stable` |
79+
| `odh-maas-controller-on-pull-request` | PR to `main` | `quay.io/opendatahub/maas-controller:odh-pr` |
80+
| `odh-maas-controller-on-push` | Push to `main` | `quay.io/opendatahub/maas-controller:odh-stable` |
81+
82+
**Integration tests (e2e):** When a PR build completes, Konflux runs an integration test that provisions an ephemeral OpenShift cluster (HyperShift on AWS), deploys the ODH stack with the newly built images, and runs `test/e2e/scripts/prow_run_smoke_test.sh`. This is defined in `odh-konflux-central` under `integration-tests/models-as-a-service/`.
83+
84+
**Docs-only skip:** PRs that only touch documentation files (`docs/**` or `**/*.md`) skip the Konflux build pipelines and integration tests entirely. This is controlled via a CEL expression in the `.tekton/` pipeline definitions.
85+
86+
### GitHub Actions
87+
88+
| Workflow | Trigger | Path filter | What it checks |
89+
|----------|---------|-------------|----------------|
90+
| PR Title Validation | Every PR | None | Semantic PR title format (`type: subject`) |
91+
| MaaS API | PR + push to `main` | `maas-api/**` (PR only) | golangci-lint, unit tests, image build |
92+
| Build | PR + push to `main` | `maas-controller/api/**`, `deployment/**`, etc. (PR only) | Kustomize manifest validation, CRD codegen verification |
93+
| Docs | PR + push to `main` | `docs/**`, `**/*.md` | Link validation, mkdocs build, GitHub Pages deploy |
94+
6895
- **PR title:** Must follow semantic format (`type: subject`, subject not starting with a capital). Use `draft`/`wip` label to bypass.
6996
- **Kustomize:** Manifests under `deployment/` are validated with `scripts/ci/validate-manifests.sh` (kustomize build).
7097
- **MaaS Controller codegen:** CI verifies that generated deepcopy code (`maas-controller/api/maas/v1alpha1/zz_generated.deepcopy.go`) and CRD manifests (`deployment/base/maas-controller/crd/bases/`) are in sync with the API types. If you change any file under `maas-controller/api/`, run `make -C maas-controller generate manifests` and commit the results before pushing. The check fails when uncommitted generated changes are detected.

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,8 @@ For detailed instructions, see the [Deployment Guide](docs/content/quickstart.md
8080
|----------|-------------|---------|
8181
| `MAAS_API_IMAGE` | Custom MaaS API container image (works in both operator and kustomize modes) | `quay.io/user/maas-api:pr-123` |
8282
| `MAAS_CONTROLLER_IMAGE` | Custom MaaS controller container image | `quay.io/user/maas-controller:pr-123` |
83+
| `METADATA_CACHE_TTL` | TTL in seconds for Authorino metadata HTTP caching | `60` (default), `300` |
84+
| `AUTHZ_CACHE_TTL` | TTL in seconds for Authorino OPA authorization caching | `60` (default), `30` |
8385
| `OPERATOR_CATALOG` | Custom operator catalog | `quay.io/opendatahub/catalog:pr-456` |
8486
| `OPERATOR_IMAGE` | Custom operator image | `quay.io/opendatahub/operator:pr-456` |
8587
| `OPERATOR_TYPE` | Operator type (rhoai/odh) | `odh` |
@@ -127,6 +129,7 @@ MAAS_API_IMAGE=quay.io/myuser/maas-api:pr-123 \
127129

128130
- [Deployment Guide](docs/content/quickstart.md) - Complete deployment instructions
129131
- [MaaS API Documentation](maas-api/README.md) - Go API for key management
132+
- [Authorino Caching Configuration](docs/content/configuration-and-management/authorino-caching.md) - Cache tuning for metadata and authorization
130133

131134
Online Documentation: [https://opendatahub-io.github.io/models-as-a-service/](https://opendatahub-io.github.io/models-as-a-service/)
132135

deployment/base/maas-api/policies/auth-policy.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ spec:
7171
when:
7272
- predicate: request.headers.authorization.startsWith("Bearer sk-oai-")
7373
plain:
74-
expression: '''["'' + auth.metadata.apiKeyValidation.groups.join(''","'') + ''"]'''
74+
selector: auth.metadata.apiKeyValidation.groups.@tostr
7575
priority: 0
7676
# Groups: from OpenShift identity as JSON array (when OC token used)
7777
X-MaaS-Group-OC:

deployment/base/maas-controller/manager/manager.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@ spec:
3333
- --maas-api-namespace=$(MAAS_API_NAMESPACE)
3434
- --maas-subscription-namespace=$(MAAS_SUBSCRIPTION_NAMESPACE)
3535
- --cluster-audience=$(CLUSTER_AUDIENCE)
36+
- --metadata-cache-ttl=$(METADATA_CACHE_TTL)
37+
- --authz-cache-ttl=$(AUTHZ_CACHE_TTL)
3638
env:
3739
- name: GATEWAY_NAME
3840
value: "maas-default-gateway"
@@ -46,6 +48,10 @@ spec:
4648
value: "models-as-a-service"
4749
- name: CLUSTER_AUDIENCE
4850
value: "https://kubernetes.default.svc"
51+
- name: METADATA_CACHE_TTL
52+
value: "60"
53+
- name: AUTHZ_CACHE_TTL
54+
value: "60"
4955
image: maas-controller
5056
name: manager
5157
imagePullPolicy: Always

deployment/overlays/common/params.env

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,5 @@ gateway-name=maas-default-gateway
66
# Used for AuthPolicy URL (maas-api.<app-namespace>.svc) and DestinationRule host
77
app-namespace=opendatahub
88
api-key-max-expiration-days=30
9+
metadata-cache-ttl=60
10+
authz-cache-ttl=60

deployment/overlays/odh/kustomization.yaml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,36 @@ resources:
4040
components:
4141
- ../../components/shared-patches
4242

43+
# ODH-SPECIFIC PATCHES
44+
# Additional cache TTL configuration for maas-controller
45+
patches:
46+
- patch: |-
47+
apiVersion: apps/v1
48+
kind: Deployment
49+
metadata:
50+
name: maas-controller
51+
spec:
52+
template:
53+
spec:
54+
containers:
55+
- name: manager
56+
env:
57+
- name: METADATA_CACHE_TTL
58+
valueFrom:
59+
configMapKeyRef:
60+
key: metadata-cache-ttl
61+
name: maas-parameters
62+
optional: true
63+
- name: AUTHZ_CACHE_TTL
64+
valueFrom:
65+
configMapKeyRef:
66+
key: authz-cache-ttl
67+
name: maas-parameters
68+
optional: true
69+
target:
70+
kind: Deployment
71+
name: maas-controller
72+
4373
# ODH-SPECIFIC REPLACEMENTS
4474
# These are in addition to shared-patches and handle ODH-specific resources
4575
# (gateway policies and DestinationRule)
Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
# Authorino Caching Configuration
2+
3+
This document describes the Authorino/Kuadrant caching configuration in MaaS, including how to tune cache TTLs for metadata and authorization evaluators.
4+
5+
---
6+
7+
## Overview
8+
9+
MaaS-generated AuthPolicy resources enable Authorino-style caching on:
10+
11+
- **Metadata evaluators** (HTTP calls to maas-api):
12+
- `apiKeyValidation` - validates API keys and returns user identity + groups
13+
- `subscription-info` - selects the appropriate subscription for the request
14+
15+
- **Authorization evaluators** (OPA policy evaluation):
16+
- `auth-valid` - validates authentication (API key OR K8s token)
17+
- `subscription-valid` - ensures a valid subscription was selected
18+
- `require-group-membership` - checks user/group membership against allowed lists
19+
20+
Caching reduces load on maas-api and CPU spent on Rego re-evaluation by reusing results when the cache key repeats within the TTL window.
21+
22+
---
23+
24+
## Configuration
25+
26+
### Environment Variables
27+
28+
The maas-controller deployment supports the following environment variables to configure cache TTLs:
29+
30+
| Variable | Description | Default | Unit | Constraints |
31+
|----------|-------------|---------|------|-------------|
32+
| `METADATA_CACHE_TTL` | TTL for metadata HTTP caching (apiKeyValidation, subscription-info) | `60` | seconds | Must be ≥ 0 |
33+
| `AUTHZ_CACHE_TTL` | TTL for OPA authorization caching (auth-valid, subscription-valid, require-group-membership) | `60` | seconds | Must be ≥ 0 |
34+
35+
**Note:** The controller will fail to start if either TTL is set to a negative value.
36+
37+
### Deployment Configuration
38+
39+
#### Via params.env (ODH Overlay)
40+
41+
Edit `deployment/overlays/odh/params.env`:
42+
43+
```env
44+
metadata-cache-ttl=300 # 5 minutes
45+
authz-cache-ttl=30 # 30 seconds
46+
```
47+
48+
These values are injected into the maas-controller deployment via ConfigMap.
49+
50+
#### Via manager.yaml (Base Deployment)
51+
52+
Edit `deployment/base/maas-controller/manager/manager.yaml`:
53+
54+
```yaml
55+
env:
56+
- name: METADATA_CACHE_TTL
57+
value: "300" # 5 minutes
58+
- name: AUTHZ_CACHE_TTL
59+
value: "30" # 30 seconds
60+
```
61+
62+
### Important: Authorization Cache TTL Capping
63+
64+
**Authorization caches are automatically capped at the metadata cache TTL** to prevent stale authorization decisions.
65+
66+
Authorization evaluators (auth-valid, subscription-valid, require-group-membership) depend on metadata evaluators (apiKeyValidation, subscription-info). If authorization caches outlive metadata caches, stale metadata can lead to incorrect authorization decisions.
67+
68+
**Example:**
69+
```yaml
70+
METADATA_CACHE_TTL=60 # 1 minute
71+
AUTHZ_CACHE_TTL=300 # 5 minutes (will be capped at 60 seconds)
72+
```
73+
74+
In this scenario:
75+
- Metadata caches use 60-second TTL ✅
76+
- Authorization caches use **60-second TTL** (capped, not 300) ✅
77+
- A warning is logged at startup: "Authorization cache TTL exceeds metadata cache TTL"
78+
79+
**Recommendation:** Set `AUTHZ_CACHE_TTL ≤ METADATA_CACHE_TTL` to avoid confusion.
80+
81+
---
82+
83+
## Cache Key Design
84+
85+
Cache keys are carefully designed to prevent data leakage between principals, subscriptions, and models.
86+
87+
### Collision Resistance
88+
89+
Cache keys use single-character delimiters (`|` and `,`) to separate components:
90+
91+
- **Field delimiter**: `|` separates major components (user ID, groups, subscription, model)
92+
- **Group delimiter**: `,` joins multiple group names
93+
94+
**For API Keys - Collision Resistant:**
95+
Cache keys use database-assigned UUIDs instead of usernames:
96+
- User ID: Database primary key (UUID format in `api_keys.id` column)
97+
- Immutable and unique per API key
98+
- Not user-controllable (assigned by database on creation)
99+
- Example key: `550e8400-e29b-41d4-a716-446655440000|team,admin|sub1|ns/model`
100+
- No collision risk even if groups contain delimiters (UUID prefix ensures uniqueness)
101+
102+
**For Kubernetes Tokens - Already Safe:**
103+
Kubernetes usernames follow validated format enforced by the K8s API:
104+
- Pattern: `system:serviceaccount:namespace:sa-name`
105+
- Kubernetes validates namespace/SA names (DNS-1123: alphanumeric + hyphens only)
106+
- No special characters like `|` or `,` allowed in usernames
107+
- Creating service accounts requires cluster permissions (not user self-service)
108+
109+
**Implementation:**
110+
The `apiKeyValidation` metadata evaluator returns a `userId` field:
111+
- API keys: Set to `api_keys.id` (database UUID)
112+
- Cache keys reference `auth.metadata.apiKeyValidation.userId` in CEL expressions
113+
- This eliminates username-based collision attacks
114+
115+
### Metadata Caches
116+
117+
**apiKeyValidation:**
118+
- **Only runs for API key requests** (Authorization header matches `Bearer sk-oai-*`)
119+
- Key: `<api-key-value>`
120+
- Ensures each unique API key has its own cache entry
121+
- Does not run for Kubernetes token requests (prevents cache pollution)
122+
- Returns `userId` field set to database UUID (`api_keys.id`)
123+
124+
**subscription-info:**
125+
- Key: `<userId>|<groups>|<requested-subscription>|<model-namespace>/<model-name>`
126+
- For API keys: `userId` is database UUID from `apiKeyValidation` response
127+
- For K8s tokens: `userId` is validated K8s username (`system:serviceaccount:...`)
128+
- Groups joined with `,` delimiter
129+
- Ensures cache isolation per user, group membership, requested subscription, and model
130+
131+
### Authorization Caches
132+
133+
**auth-valid:**
134+
- Key: `<auth-type>|<identity>|<model-namespace>/<model-name>`
135+
- For API keys: `api-key|<key-value>|model`
136+
- For K8s tokens: `k8s-token|<username>|model`
137+
138+
**subscription-valid:**
139+
- Key: Same as subscription-info metadata (ensures cache coherence)
140+
- Format: `<userId>|<groups>|<requested-subscription>|<model>`
141+
- For API keys: `userId` is database UUID. For K8s tokens: validated username.
142+
143+
**require-group-membership:**
144+
- Key: `<userId>|<groups>|<model-namespace>/<model-name>`
145+
- For API keys: `userId` is database UUID. For K8s tokens: validated username.
146+
- Groups joined with `,` delimiter
147+
- Ensures cache isolation per user identity and model
148+
149+
---
150+
151+
## Operational Tuning
152+
153+
### When to Increase Metadata Cache TTL
154+
155+
- **High API key validation load**: If maas-api is experiencing high load from repeated `/internal/v1/api-keys/validate` calls
156+
- **Stable API keys**: API key metadata (username, groups) doesn't change frequently
157+
- **Example**: Set `METADATA_CACHE_TTL=300` (5 minutes) to reduce maas-api load by 5x
158+
159+
### When to Decrease Authorization Cache TTL
160+
161+
- **Group membership changes**: If users are frequently added/removed from groups
162+
- **Security compliance**: Shorter TTL ensures access changes propagate faster
163+
- **Example**: Set `AUTHZ_CACHE_TTL=30` (30 seconds) for faster group membership updates
164+
165+
### Monitoring
166+
167+
After changing TTL values, monitor:
168+
- **maas-api load**: Reduced `/internal/v1/api-keys/validate` and `/internal/v1/subscriptions/select` call rates
169+
- **Authorino CPU**: Reduced OPA evaluation CPU usage
170+
- **Request latency**: Cache hits should have lower P99 latency
171+
172+
---
173+
174+
## Security Notes
175+
176+
### Cache Key Correctness
177+
178+
All cache keys include sufficient dimensions to prevent cross-principal or cross-subscription cache sharing:
179+
180+
- **Never share cache entries between different users**
181+
- **Never share cache entries between different API keys**
182+
- **Never share cache entries between different models** (model namespace/name in key)
183+
- **Never share cache entries between different group memberships** (groups in key)
184+
185+
### Cache Key Collision Risk
186+
187+
**API Keys - No Collision Risk:**
188+
Cache keys use database-assigned UUIDs instead of usernames:
189+
- User IDs are unique 128-bit UUIDs (format: `550e8400-e29b-41d4-a716-446655440000`)
190+
- Immutable and assigned by PostgreSQL at API key creation
191+
- Not user-controllable (no self-service user ID selection)
192+
- Even if groups contain delimiters (`,` or `|`), the UUID prefix prevents collision
193+
- Example: Two users with groups `["team,admin"]` and `["team", "admin"]` have different UUIDs, so no collision
194+
195+
**Kubernetes Tokens - No Collision Risk:**
196+
Kubernetes usernames are validated by the K8s API server:
197+
- Format: `system:serviceaccount:namespace:sa-name`
198+
- Kubernetes enforces DNS-1123 naming: `[a-z0-9]([-a-z0-9]*[a-z0-9])?`
199+
- No special characters like `|` or `,` allowed
200+
- Creating service accounts requires cluster RBAC permissions (not user self-service)
201+
202+
**Remaining Edge Case - Group Ordering:**
203+
Group array ordering affects cache keys:
204+
- `["admin", "user"]` produces different key than `["user", "admin"]`
205+
- CEL has no array sort() function
206+
- Impact: Suboptimal cache hit rate if group order varies between OIDC token refreshes
207+
- Mitigation: OIDC providers and K8s TokenReview typically return groups in consistent order
208+
209+
### Stale Data Window
210+
211+
Cache TTL represents the maximum staleness window:
212+
213+
- **Metadata caches**: API key revocation or group membership changes may take up to `METADATA_CACHE_TTL` seconds to propagate
214+
- **Authorization caches**: Authorization policy changes may take up to `AUTHZ_CACHE_TTL` seconds to propagate
215+
216+
For immediate policy enforcement after changes:
217+
1. Delete the affected AuthPolicy to clear Authorino's cache
218+
2. Or wait for the TTL to expire
219+
220+
---
221+
222+
## References
223+
224+
- [Authorino Caching User Guide](https://docs.kuadrant.io/latest/authorino/docs/features/#caching)
225+
- [AuthPolicy Reference](https://docs.kuadrant.io/latest/kuadrant-operator/doc/reference/authpolicy/)
226+
- [MaaS Controller Overview](./maas-controller-overview.md)

0 commit comments

Comments
 (0)