opendatahub-io
diff --git a/‎.tekton/odh-maas-api-pull-request.yaml‎
Lines changed: 1 addition & 1 deletion b/‎.tekton/odh-maas-api-pull-request.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.tekton/odh-maas-controller-pull-request.yaml‎
Lines changed: 1 addition & 1 deletion b/‎.tekton/odh-maas-controller-pull-request.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 28 additions & 1 deletion b/‎CONTRIBUTING.md‎
Lines changed: 28 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 3 additions & 0 deletions b/‎README.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎deployment/base/maas-api/policies/auth-policy.yaml‎
Lines changed: 1 addition & 1 deletion b/‎deployment/base/maas-api/policies/auth-policy.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎deployment/base/maas-controller/manager/manager.yaml‎
Lines changed: 6 additions & 0 deletions b/‎deployment/base/maas-controller/manager/manager.yaml‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎deployment/overlays/common/params.env‎
Lines changed: 2 additions & 0 deletions b/‎deployment/overlays/common/params.env‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎deployment/overlays/odh/kustomization.yaml‎
Lines changed: 30 additions & 0 deletions b/‎deployment/overlays/odh/kustomization.yaml‎
Lines changed: 30 additions & 0 deletions
diff --git a/‎docs/content/configuration-and-management/authorino-caching.md‎
Lines changed: 226 additions & 0 deletions b/‎docs/content/configuration-and-management/authorino-caching.md‎
Lines changed: 226 additions & 0 deletions
@@ -9,7 +9,7 @@ metadata:
     pipelinesascode.tekton.dev/cancel-in-progress: "false"
     pipelinesascode.tekton.dev/max-keep-runs: "3"
     pipelinesascode.tekton.dev/on-cel-expression: event == "pull_request" && target_branch
-      == "main"
+      == "main" && !files.all.all(x, x.matches('^docs/') || x.matches('\\.md$'))
   creationTimestamp: null
   labels:
     appstudio.openshift.io/application: opendatahub-builds
 
@@ -9,7 +9,7 @@ metadata:
     pipelinesascode.tekton.dev/cancel-in-progress: "false"
     pipelinesascode.tekton.dev/max-keep-runs: "3"
     pipelinesascode.tekton.dev/on-cel-expression: event == "pull_request" && target_branch
-      == "main"
+      == "main" && !files.all.all(x, x.matches('^docs/') || x.matches('\\.md$'))
   creationTimestamp: null
   labels:
     appstudio.openshift.io/application: opendatahub-builds
 
@@ -61,10 +61,37 @@ This project follows a **Stream-Lake-Ocean** release model. Code flows from acti
 | `maas-controller/` | Kubernetes controller for MaaS CRDs; see [maas-controller/README.md](maas-controller/README.md) |
 | `docs/` | User and admin documentation (MkDocs); [online docs](https://opendatahub-io.github.io/models-as-a-service/) |
 | `test/` | E2E and billing/smoke tests |
-| `.github/workflows/` | CI (build, PR title validation, MaaS API lint/build) |
+| `.github/workflows/` | GitHub Actions CI (lint, build, PR title validation, docs) |
+| `.tekton/` | Konflux/Tekton pipeline definitions for container image builds |
 
 ## CI and checks
 
+This project uses two CI systems: **Konflux** (Tekton-based) for container image builds and integration testing, and **GitHub Actions** for linting, unit tests, and documentation.
+
+### Konflux / Tekton pipelines
+
+Konflux builds multi-arch container images (x86_64, arm64, ppc64le, s390x) for both `maas-api` and `maas-controller` on every PR and push to `main`. Pipeline definitions live in `.tekton/` and reference a shared pipeline from [odh-konflux-central](https://github.com/opendatahub-io/odh-konflux-central) (`pipeline/multi-arch-container-build.yaml`).
+
+| Pipeline | Trigger | Output image |
+|----------|---------|--------------|
+| `odh-maas-api-on-pull-request` | PR to `main` | `quay.io/opendatahub/maas-api:odh-pr` |
+| `odh-maas-api-on-push` | Push to `main` | `quay.io/opendatahub/maas-api:odh-stable` |
+| `odh-maas-controller-on-pull-request` | PR to `main` | `quay.io/opendatahub/maas-controller:odh-pr` |
+| `odh-maas-controller-on-push` | Push to `main` | `quay.io/opendatahub/maas-controller:odh-stable` |
+
+**Integration tests (e2e):** When a PR build completes, Konflux runs an integration test that provisions an ephemeral OpenShift cluster (HyperShift on AWS), deploys the ODH stack with the newly built images, and runs `test/e2e/scripts/prow_run_smoke_test.sh`. This is defined in `odh-konflux-central` under `integration-tests/models-as-a-service/`.
+
+**Docs-only skip:** PRs that only touch documentation files (`docs/**` or `**/*.md`) skip the Konflux build pipelines and integration tests entirely. This is controlled via a CEL expression in the `.tekton/` pipeline definitions.
+
+### GitHub Actions
+
+| Workflow | Trigger | Path filter | What it checks |
+|----------|---------|-------------|----------------|
+| PR Title Validation | Every PR | None | Semantic PR title format (`type: subject`) |
+| MaaS API | PR + push to `main` | `maas-api/**` (PR only) | golangci-lint, unit tests, image build |
+| Build | PR + push to `main` | `maas-controller/api/**`, `deployment/**`, etc. (PR only) | Kustomize manifest validation, CRD codegen verification |
+| Docs | PR + push to `main` | `docs/**`, `**/*.md` | Link validation, mkdocs build, GitHub Pages deploy |
+
 - **PR title:** Must follow semantic format (`type: subject`, subject not starting with a capital). Use `draft`/`wip` label to bypass.
 - **Kustomize:** Manifests under `deployment/` are validated with `scripts/ci/validate-manifests.sh` (kustomize build).
 - **MaaS Controller codegen:** CI verifies that generated deepcopy code (`maas-controller/api/maas/v1alpha1/zz_generated.deepcopy.go`) and CRD manifests (`deployment/base/maas-controller/crd/bases/`) are in sync with the API types. If you change any file under `maas-controller/api/`, run `make -C maas-controller generate manifests` and commit the results before pushing. The check fails when uncommitted generated changes are detected.
 
@@ -80,6 +80,8 @@ For detailed instructions, see the [Deployment Guide](docs/content/quickstart.md
 |----------|-------------|---------|
 | `MAAS_API_IMAGE` | Custom MaaS API container image (works in both operator and kustomize modes) | `quay.io/user/maas-api:pr-123` |
 | `MAAS_CONTROLLER_IMAGE` | Custom MaaS controller container image | `quay.io/user/maas-controller:pr-123` |
+| `METADATA_CACHE_TTL` | TTL in seconds for Authorino metadata HTTP caching | `60` (default), `300` |
+| `AUTHZ_CACHE_TTL` | TTL in seconds for Authorino OPA authorization caching | `60` (default), `30` |
 | `OPERATOR_CATALOG` | Custom operator catalog | `quay.io/opendatahub/catalog:pr-456` |
 | `OPERATOR_IMAGE` | Custom operator image | `quay.io/opendatahub/operator:pr-456` |
 | `OPERATOR_TYPE` | Operator type (rhoai/odh) | `odh` |
@@ -127,6 +129,7 @@ MAAS_API_IMAGE=quay.io/myuser/maas-api:pr-123 \
 
 - [Deployment Guide](docs/content/quickstart.md) - Complete deployment instructions
 - [MaaS API Documentation](maas-api/README.md) - Go API for key management
+- [Authorino Caching Configuration](docs/content/configuration-and-management/authorino-caching.md) - Cache tuning for metadata and authorization
 
 Online Documentation: [https://opendatahub-io.github.io/models-as-a-service/](https://opendatahub-io.github.io/models-as-a-service/)
 
 
@@ -71,7 +71,7 @@ spec:
             when:
               - predicate: request.headers.authorization.startsWith("Bearer sk-oai-")
             plain:
-              expression: '''["'' + auth.metadata.apiKeyValidation.groups.join(''","'') + ''"]'''
+              selector: auth.metadata.apiKeyValidation.groups.@tostr
             priority: 0
           # Groups: from OpenShift identity as JSON array (when OC token used)
           X-MaaS-Group-OC:
 
@@ -33,6 +33,8 @@ spec:
           - --maas-api-namespace=$(MAAS_API_NAMESPACE)
           - --maas-subscription-namespace=$(MAAS_SUBSCRIPTION_NAMESPACE)
           - --cluster-audience=$(CLUSTER_AUDIENCE)
+          - --metadata-cache-ttl=$(METADATA_CACHE_TTL)
+          - --authz-cache-ttl=$(AUTHZ_CACHE_TTL)
         env:
           - name: GATEWAY_NAME
             value: "maas-default-gateway"
@@ -46,6 +48,10 @@ spec:
             value: "models-as-a-service"
           - name: CLUSTER_AUDIENCE
             value: "https://kubernetes.default.svc"
+          - name: METADATA_CACHE_TTL
+            value: "60"
+          - name: AUTHZ_CACHE_TTL
+            value: "60"
         image: maas-controller
         name: manager
         imagePullPolicy: Always
 
@@ -6,3 +6,5 @@ gateway-name=maas-default-gateway
 # Used for AuthPolicy URL (maas-api.<app-namespace>.svc) and DestinationRule host
 app-namespace=opendatahub
 api-key-max-expiration-days=30
+metadata-cache-ttl=60
+authz-cache-ttl=60
@@ -40,6 +40,36 @@ resources:
 components:
   - ../../components/shared-patches
 
+# ODH-SPECIFIC PATCHES
+# Additional cache TTL configuration for maas-controller
+patches:
+- patch: |-
+    apiVersion: apps/v1
+    kind: Deployment
+    metadata:
+      name: maas-controller
+    spec:
+      template:
+        spec:
+          containers:
+          - name: manager
+            env:
+            - name: METADATA_CACHE_TTL
+              valueFrom:
+                configMapKeyRef:
+                  key: metadata-cache-ttl
+                  name: maas-parameters
+                  optional: true
+            - name: AUTHZ_CACHE_TTL
+              valueFrom:
+                configMapKeyRef:
+                  key: authz-cache-ttl
+                  name: maas-parameters
+                  optional: true
+  target:
+    kind: Deployment
+    name: maas-controller
+
 # ODH-SPECIFIC REPLACEMENTS
 # These are in addition to shared-patches and handle ODH-specific resources
 # (gateway policies and DestinationRule)
 
@@ -0,0 +1,226 @@
+# Authorino Caching Configuration
+
+This document describes the Authorino/Kuadrant caching configuration in MaaS, including how to tune cache TTLs for metadata and authorization evaluators.
+
+---
+
+## Overview
+
+MaaS-generated AuthPolicy resources enable Authorino-style caching on:
+
+- **Metadata evaluators** (HTTP calls to maas-api):
+  - `apiKeyValidation` - validates API keys and returns user identity + groups
+  - `subscription-info` - selects the appropriate subscription for the request
+
+- **Authorization evaluators** (OPA policy evaluation):
+  - `auth-valid` - validates authentication (API key OR K8s token)
+  - `subscription-valid` - ensures a valid subscription was selected
+  - `require-group-membership` - checks user/group membership against allowed lists
+
+Caching reduces load on maas-api and CPU spent on Rego re-evaluation by reusing results when the cache key repeats within the TTL window.
+
+---
+
+## Configuration
+
+### Environment Variables
+
+The maas-controller deployment supports the following environment variables to configure cache TTLs:
+
+| Variable | Description | Default | Unit | Constraints |
+|----------|-------------|---------|------|-------------|
+| `METADATA_CACHE_TTL` | TTL for metadata HTTP caching (apiKeyValidation, subscription-info) | `60` | seconds | Must be ≥ 0 |
+| `AUTHZ_CACHE_TTL` | TTL for OPA authorization caching (auth-valid, subscription-valid, require-group-membership) | `60` | seconds | Must be ≥ 0 |
+
+**Note:** The controller will fail to start if either TTL is set to a negative value.
+
+### Deployment Configuration
+
+#### Via params.env (ODH Overlay)
+
+Edit `deployment/overlays/odh/params.env`:
+
+```env
+metadata-cache-ttl=300  # 5 minutes
+authz-cache-ttl=30      # 30 seconds
+```
+
+These values are injected into the maas-controller deployment via ConfigMap.
+
+#### Via manager.yaml (Base Deployment)
+
+Edit `deployment/base/maas-controller/manager/manager.yaml`:
+
+```yaml
+env:
+  - name: METADATA_CACHE_TTL
+    value: "300"  # 5 minutes
+  - name: AUTHZ_CACHE_TTL
+    value: "30"   # 30 seconds
+```
+
+### Important: Authorization Cache TTL Capping
+
+**Authorization caches are automatically capped at the metadata cache TTL** to prevent stale authorization decisions.
+
+Authorization evaluators (auth-valid, subscription-valid, require-group-membership) depend on metadata evaluators (apiKeyValidation, subscription-info). If authorization caches outlive metadata caches, stale metadata can lead to incorrect authorization decisions.
+
+**Example:**
+```yaml
+METADATA_CACHE_TTL=60   # 1 minute
+AUTHZ_CACHE_TTL=300     # 5 minutes (will be capped at 60 seconds)
+```
+
+In this scenario:
+- Metadata caches use 60-second TTL ✅
+- Authorization caches use **60-second TTL** (capped, not 300) ✅
+- A warning is logged at startup: "Authorization cache TTL exceeds metadata cache TTL"
+
+**Recommendation:** Set `AUTHZ_CACHE_TTL ≤ METADATA_CACHE_TTL` to avoid confusion.
+
+---
+
+## Cache Key Design
+
+Cache keys are carefully designed to prevent data leakage between principals, subscriptions, and models.
+
+### Collision Resistance
+
+Cache keys use single-character delimiters (`|` and `,`) to separate components:
+
+- **Field delimiter**: `|` separates major components (user ID, groups, subscription, model)
+- **Group delimiter**: `,` joins multiple group names
+
+**For API Keys - Collision Resistant:**
+Cache keys use database-assigned UUIDs instead of usernames:
+- User ID: Database primary key (UUID format in `api_keys.id` column)
+- Immutable and unique per API key
+- Not user-controllable (assigned by database on creation)
+- Example key: `550e8400-e29b-41d4-a716-446655440000|team,admin|sub1|ns/model`
+- No collision risk even if groups contain delimiters (UUID prefix ensures uniqueness)
+
+**For Kubernetes Tokens - Already Safe:**
+Kubernetes usernames follow validated format enforced by the K8s API:
+- Pattern: `system:serviceaccount:namespace:sa-name`
+- Kubernetes validates namespace/SA names (DNS-1123: alphanumeric + hyphens only)
+- No special characters like `|` or `,` allowed in usernames
+- Creating service accounts requires cluster permissions (not user self-service)
+
+**Implementation:**
+The `apiKeyValidation` metadata evaluator returns a `userId` field:
+- API keys: Set to `api_keys.id` (database UUID)
+- Cache keys reference `auth.metadata.apiKeyValidation.userId` in CEL expressions
+- This eliminates username-based collision attacks
+
+### Metadata Caches
+
+**apiKeyValidation:**
+- **Only runs for API key requests** (Authorization header matches `Bearer sk-oai-*`)
+- Key: `<api-key-value>`
+- Ensures each unique API key has its own cache entry
+- Does not run for Kubernetes token requests (prevents cache pollution)
+- Returns `userId` field set to database UUID (`api_keys.id`)
+
+**subscription-info:**
+- Key: `<userId>|<groups>|<requested-subscription>|<model-namespace>/<model-name>`
+- For API keys: `userId` is database UUID from `apiKeyValidation` response
+- For K8s tokens: `userId` is validated K8s username (`system:serviceaccount:...`)
+- Groups joined with `,` delimiter
+- Ensures cache isolation per user, group membership, requested subscription, and model
+
+### Authorization Caches
+
+**auth-valid:**
+- Key: `<auth-type>|<identity>|<model-namespace>/<model-name>`
+- For API keys: `api-key|<key-value>|model`
+- For K8s tokens: `k8s-token|<username>|model`
+
+**subscription-valid:**
+- Key: Same as subscription-info metadata (ensures cache coherence)
+- Format: `<userId>|<groups>|<requested-subscription>|<model>`
+- For API keys: `userId` is database UUID. For K8s tokens: validated username.
+
+**require-group-membership:**
+- Key: `<userId>|<groups>|<model-namespace>/<model-name>`
+- For API keys: `userId` is database UUID. For K8s tokens: validated username.
+- Groups joined with `,` delimiter
+- Ensures cache isolation per user identity and model
+
+---
+
+## Operational Tuning
+
+### When to Increase Metadata Cache TTL
+
+- **High API key validation load**: If maas-api is experiencing high load from repeated `/internal/v1/api-keys/validate` calls
+- **Stable API keys**: API key metadata (username, groups) doesn't change frequently
+- **Example**: Set `METADATA_CACHE_TTL=300` (5 minutes) to reduce maas-api load by 5x
+
+### When to Decrease Authorization Cache TTL
+
+- **Group membership changes**: If users are frequently added/removed from groups
+- **Security compliance**: Shorter TTL ensures access changes propagate faster
+- **Example**: Set `AUTHZ_CACHE_TTL=30` (30 seconds) for faster group membership updates
+
+### Monitoring
+
+After changing TTL values, monitor:
+- **maas-api load**: Reduced `/internal/v1/api-keys/validate` and `/internal/v1/subscriptions/select` call rates
+- **Authorino CPU**: Reduced OPA evaluation CPU usage
+- **Request latency**: Cache hits should have lower P99 latency
+
+---
+
+## Security Notes
+
+### Cache Key Correctness
+
+All cache keys include sufficient dimensions to prevent cross-principal or cross-subscription cache sharing:
+
+- **Never share cache entries between different users**
+- **Never share cache entries between different API keys**
+- **Never share cache entries between different models** (model namespace/name in key)
+- **Never share cache entries between different group memberships** (groups in key)
+
+### Cache Key Collision Risk
+
+**API Keys - No Collision Risk:**
+Cache keys use database-assigned UUIDs instead of usernames:
+- User IDs are unique 128-bit UUIDs (format: `550e8400-e29b-41d4-a716-446655440000`)
+- Immutable and assigned by PostgreSQL at API key creation
+- Not user-controllable (no self-service user ID selection)
+- Even if groups contain delimiters (`,` or `|`), the UUID prefix prevents collision
+- Example: Two users with groups `["team,admin"]` and `["team", "admin"]` have different UUIDs, so no collision
+
+**Kubernetes Tokens - No Collision Risk:**
+Kubernetes usernames are validated by the K8s API server:
+- Format: `system:serviceaccount:namespace:sa-name`
+- Kubernetes enforces DNS-1123 naming: `[a-z0-9]([-a-z0-9]*[a-z0-9])?`
+- No special characters like `|` or `,` allowed
+- Creating service accounts requires cluster RBAC permissions (not user self-service)
+
+**Remaining Edge Case - Group Ordering:**
+Group array ordering affects cache keys:
+- `["admin", "user"]` produces different key than `["user", "admin"]`
+- CEL has no array sort() function
+- Impact: Suboptimal cache hit rate if group order varies between OIDC token refreshes
+- Mitigation: OIDC providers and K8s TokenReview typically return groups in consistent order
+
+### Stale Data Window
+
+Cache TTL represents the maximum staleness window:
+
+- **Metadata caches**: API key revocation or group membership changes may take up to `METADATA_CACHE_TTL` seconds to propagate
+- **Authorization caches**: Authorization policy changes may take up to `AUTHZ_CACHE_TTL` seconds to propagate
+
+For immediate policy enforcement after changes:
+1. Delete the affected AuthPolicy to clear Authorino's cache
+2. Or wait for the TTL to expire
+
+---
+
+## References
+
+- [Authorino Caching User Guide](https://docs.kuadrant.io/latest/authorino/docs/features/#caching)
+- [AuthPolicy Reference](https://docs.kuadrant.io/latest/kuadrant-operator/doc/reference/authpolicy/)
+- [MaaS Controller Overview](./maas-controller-overview.md)