Skip to content

Commit 0b67ec8

Browse files
feat: add E2E tests and documentation for ephemeral API key cleanup (#639)
<!--- Provide a general summary of your changes in the Title above --> ## Description Add TestEphemeralKeyCleanup E2E test class covering: - CronJob existence and configuration validation (schedule, security) - NetworkPolicy existence and restriction validation - Ephemeral key creation and search visibility (includeEphemeral filter) - Cleanup trigger via oc exec preserves active ephemeral keys Update documentation: - token-management.md: ephemeral keys section, cleanup mechanics, grace period, security model, troubleshooting commands - maas-api-overview.md: internal endpoints table listing cleanup, validate, and subscription select routes <!--- Describe your changes in detail --> ## How Has This Been Tested? <!--- Please describe in detail how you tested your changes. --> <!--- Include details of your testing environment, and the tests you ran to --> <!--- see how your change affects other areas of the code, etc. --> ## Merge criteria: <!--- This PR will be merged by any repository approver when it meets all the points in the checklist --> <!--- Go over all the following points, and put an `x` in all the boxes that apply. --> - [ ] The commits are squashed in a cohesive manner and have meaningful messages. - [ ] Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious). - [ ] The developer has manually tested the changes and verified that the changes work <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Added comprehensive documentation for ephemeral API keys with built-in automatic cleanup capabilities * Documented new subscription endpoints and internal cluster-only API endpoints with security and operational details * Included configuration guidance, operational behavior documentation, and troubleshooting instructions for key management * **Tests** * Added end-to-end tests validating ephemeral key creation, search, filtering, and automatic cleanup behavior <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Wen Liang <liangwen12year@gmail.com>
1 parent babc125 commit 0b67ec8

File tree

3 files changed

+327
-0
lines changed

3 files changed

+327
-0
lines changed

docs/content/configuration-and-management/token-management.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,66 @@ Revocation updates the key status to `revoked` in the database. The next validat
206206
!!! warning "Important"
207207
**For Platform Administrators**: Admins can revoke any user's keys via `DELETE /v1/api-keys/:id` (if they own or have admin access) or `POST /v1/api-keys/bulk-revoke` with the target username. This is an effective way to immediately cut off access for a specific user in response to a security event.
208208

209+
### Ephemeral Keys
210+
211+
Ephemeral keys are short-lived credentials designed for temporary programmatic access (e.g., playground sessions). They differ from regular keys:
212+
213+
| Property | Regular Key | Ephemeral Key |
214+
|----------|-------------|---------------|
215+
| Default expiration | Configured maximum (e.g., 90 days) | 1 hour |
216+
| Maximum expiration | Configured maximum | 1 hour |
217+
| Name required | Yes | No (auto-generated if omitted) |
218+
| Visible in default search | Yes | No (`includeEphemeral: true` required) |
219+
220+
Create an ephemeral key:
221+
222+
```bash
223+
curl -sSk -X POST "${MAAS_API_URL}/maas-api/v1/api-keys" \
224+
-H "Authorization: Bearer $(oc whoami -t)" \
225+
-H "Content-Type: application/json" \
226+
-d '{"ephemeral": true, "expiresIn": "30m"}'
227+
```
228+
229+
### Ephemeral Key Cleanup
230+
231+
Expired ephemeral keys are automatically deleted from the database by a **CronJob** (`maas-api-key-cleanup`) that runs every 15 minutes. This prevents unbounded accumulation of expired short-lived credentials.
232+
233+
**How it works:**
234+
235+
1. The CronJob sends `POST /internal/v1/api-keys/cleanup` to the maas-api Service
236+
2. The endpoint deletes ephemeral keys that expired **more than 30 minutes ago** (grace period)
237+
3. Regular (non-ephemeral) keys are **never** deleted by cleanup — they remain until manually revoked
238+
239+
**Grace period:** A 30-minute grace period after expiration ensures that recently-expired keys are not deleted while in-flight requests may still reference them. Only keys expired for longer than 30 minutes are removed.
240+
241+
**Security:** The cleanup endpoint is cluster-internal only:
242+
243+
- It is registered under `/internal/v1/` and is **not exposed** on the external Service or Route
244+
- A `NetworkPolicy` (`maas-api-cleanup-restrict`) restricts cleanup pods to communicate only with `maas-api:8080` and DNS
245+
- No authentication is required on the endpoint itself — access control is enforced at the network layer
246+
247+
!!! tip "Troubleshooting cleanup"
248+
**Check CronJob status:**
249+
```bash
250+
oc get cronjob maas-api-key-cleanup -n <namespace>
251+
oc get jobs -n <namespace> -l app=maas-api-cleanup --sort-by=.metadata.creationTimestamp
252+
```
253+
254+
**View cleanup logs:**
255+
```bash
256+
# Latest CronJob run
257+
oc logs job/$(oc get jobs -n <namespace> -l app=maas-api-cleanup \
258+
--sort-by=.metadata.creationTimestamp -o jsonpath='{.items[-1].metadata.name}') \
259+
-n <namespace>
260+
```
261+
262+
**Manually trigger cleanup** (from an allowed pod or via oc exec):
263+
```bash
264+
oc exec deploy/maas-api -n <namespace> -- \
265+
curl -sf -X POST http://localhost:8080/internal/v1/api-keys/cleanup
266+
```
267+
Response: `{"deletedCount": N, "message": "Successfully deleted N expired ephemeral key(s)"}`
268+
209269
---
210270

211271
## Frequently Asked Questions (FAQ)

docs/content/reference/maas-api-overview.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,23 @@ All endpoints except `/health` require authentication via the `Authorization: Be
4444
| DELETE | `/v1/api-keys/{id}` | Revoke a specific API key. |
4545
| POST | `/v1/api-keys/bulk-revoke` | Revoke all active API keys for a user. Admins can revoke any user's keys. |
4646

47+
### Subscriptions
48+
49+
| Method | Path | Description |
50+
|--------|------|-------------|
51+
| GET | `/v1/subscriptions` | List subscriptions accessible to the authenticated user. |
52+
| GET | `/v1/model/{model-id}/subscriptions` | List subscriptions that provide access to a specific model. |
53+
54+
### Internal Endpoints (Cluster-Only)
55+
56+
These endpoints are registered under `/internal/v1/` and are **not exposed** on the external Service or Route. They are called by internal components (Authorino, CronJob) and protected by NetworkPolicy.
57+
58+
| Method | Path | Called By | Description |
59+
|--------|------|-----------|-------------|
60+
| POST | `/internal/v1/api-keys/validate` | Authorino | Validate an API key (hash lookup, status/expiry check). Returns user identity and subscription for the gateway. |
61+
| POST | `/internal/v1/api-keys/cleanup` | CronJob `maas-api-key-cleanup` | Delete expired ephemeral keys (30-minute grace period). Returns `{"deletedCount": N, "message": "..."}`. |
62+
| POST | `/internal/v1/subscriptions/select` | Authorino | Select the appropriate subscription for a request based on user groups and optional explicit selection. |
63+
4764
---
4865

4966
## Base URL

test/e2e/tests/test_api_keys.py

Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -619,3 +619,253 @@ def test_api_key_chat_completions(
619619
print(f"[inference] Chat completions returned {r.status_code}: {r.text[:200]}")
620620
# Don't fail - chat may not be supported
621621
pytest.skip(f"Chat completions returned {r.status_code}")
622+
623+
624+
class TestEphemeralKeyCleanup:
625+
"""Tests for ephemeral API key cleanup (CronJob + internal endpoint).
626+
627+
Validates that:
628+
- Ephemeral keys can be created with short expiration
629+
- The cleanup CronJob exists and is correctly configured
630+
- Triggering cleanup does not delete active (non-expired) ephemeral keys
631+
- Cleanup returns a well-formed response with deletedCount
632+
633+
The cleanup endpoint (POST /internal/v1/api-keys/cleanup) is cluster-internal
634+
and not exposed on the public Route. These tests trigger it via the CronJob
635+
mechanism (kubectl create job --from=cronjob/maas-api-key-cleanup) or via
636+
oc exec into the maas-api pod.
637+
638+
Environment Variables:
639+
- DEPLOYMENT_NAMESPACE: Namespace where maas-api is deployed (default: opendatahub)
640+
"""
641+
642+
@pytest.fixture
643+
def deployment_namespace(self) -> str:
644+
return os.environ.get("DEPLOYMENT_NAMESPACE", "opendatahub")
645+
646+
def test_cronjob_exists_and_configured(self, deployment_namespace: str):
647+
"""Verify the maas-api-key-cleanup CronJob exists with expected configuration."""
648+
import subprocess as sp
649+
650+
result = sp.run(
651+
["oc", "get", "cronjob", "maas-api-key-cleanup",
652+
"-n", deployment_namespace, "-o", "json"],
653+
capture_output=True, text=True,
654+
)
655+
if result.returncode != 0:
656+
pytest.skip(
657+
f"CronJob maas-api-key-cleanup not found in {deployment_namespace}: "
658+
f"{result.stderr.strip()}"
659+
)
660+
661+
import json as _json
662+
cj = _json.loads(result.stdout)
663+
spec = cj["spec"]
664+
665+
# Verify schedule (every 15 minutes)
666+
assert spec["schedule"] == "*/15 * * * *", \
667+
f"Expected schedule '*/15 * * * *', got '{spec['schedule']}'"
668+
669+
# Verify concurrency policy
670+
assert spec["concurrencyPolicy"] == "Forbid", \
671+
"CronJob should use Forbid concurrency policy"
672+
673+
# Verify the curl command targets the internal cleanup endpoint
674+
containers = spec["jobTemplate"]["spec"]["template"]["spec"]["containers"]
675+
assert len(containers) >= 1
676+
container_spec = containers[0]
677+
# Command is in the 'command' field (shell script via /bin/sh -c)
678+
cmd_parts = container_spec.get("command", [])
679+
cmd_str = " ".join(cmd_parts)
680+
assert "/internal/v1/api-keys/cleanup" in cmd_str, \
681+
f"CronJob command should target cleanup endpoint, got: {cmd_str}"
682+
683+
# Verify security context (non-root, read-only fs)
684+
sec_ctx = container_spec.get("securityContext", {})
685+
assert sec_ctx.get("runAsNonRoot", False) is True, \
686+
"Cleanup container should run as non-root"
687+
assert sec_ctx.get("readOnlyRootFilesystem", False) is True, \
688+
"Cleanup container should have read-only root filesystem"
689+
690+
print(f"[cleanup] CronJob validated: schedule={spec['schedule']}, "
691+
f"concurrency={spec['concurrencyPolicy']}")
692+
693+
def test_cleanup_networkpolicy_exists(self, deployment_namespace: str):
694+
"""Verify the cleanup NetworkPolicy exists and restricts cleanup pod access."""
695+
import subprocess as sp
696+
697+
result = sp.run(
698+
["oc", "get", "networkpolicy", "maas-api-cleanup-restrict",
699+
"-n", deployment_namespace, "-o", "json"],
700+
capture_output=True, text=True,
701+
)
702+
if result.returncode != 0:
703+
pytest.skip(
704+
f"NetworkPolicy maas-api-cleanup-restrict not found in "
705+
f"{deployment_namespace}: {result.stderr.strip()}"
706+
)
707+
708+
import json as _json
709+
np = _json.loads(result.stdout)
710+
spec = np["spec"]
711+
712+
# Verify it targets cleanup pods
713+
selector = spec.get("podSelector", {}).get("matchLabels", {})
714+
assert selector.get("app") == "maas-api-cleanup", \
715+
f"NetworkPolicy should target app=maas-api-cleanup, got: {selector}"
716+
717+
# Verify policy types include both Egress and Ingress
718+
policy_types = spec.get("policyTypes", [])
719+
assert "Egress" in policy_types, "NetworkPolicy should control egress"
720+
assert "Ingress" in policy_types, "NetworkPolicy should control ingress"
721+
722+
# Verify ingress is blocked (empty list)
723+
assert spec.get("ingress") == [] or spec.get("ingress") is None, \
724+
"Cleanup pods should have no inbound traffic"
725+
726+
print("[cleanup] NetworkPolicy validated: cleanup pods restricted to maas-api egress only")
727+
728+
def test_create_ephemeral_key(self, api_keys_base_url: str, headers: dict):
729+
"""Create an ephemeral key and verify it appears in search with includeEphemeral."""
730+
# Create ephemeral key with short expiration (30 minutes)
731+
r = requests.post(
732+
api_keys_base_url,
733+
headers=headers,
734+
json={
735+
"name": "e2e-ephemeral-cleanup-test",
736+
"ephemeral": True,
737+
"expiresIn": "30m",
738+
},
739+
timeout=30,
740+
verify=TLS_VERIFY,
741+
)
742+
assert r.status_code in (200, 201), \
743+
f"Expected 200/201 creating ephemeral key, got {r.status_code}: {r.text}"
744+
data = r.json()
745+
assert data.get("ephemeral") is True, "Key should be marked as ephemeral"
746+
key_id = data["id"]
747+
print(f"[cleanup] Created ephemeral key: id={key_id}, expiresAt={data.get('expiresAt')}")
748+
749+
# Verify ephemeral key appears in search with includeEphemeral filter
750+
r_search = requests.post(
751+
f"{api_keys_base_url}/search",
752+
headers=headers,
753+
json={
754+
"filters": {"status": ["active"], "includeEphemeral": True},
755+
"pagination": {"limit": 50, "offset": 0},
756+
},
757+
timeout=30,
758+
verify=TLS_VERIFY,
759+
)
760+
assert r_search.status_code == 200
761+
items = r_search.json().get("items") or r_search.json().get("data") or []
762+
found_ids = [item["id"] for item in items]
763+
assert key_id in found_ids, \
764+
f"Ephemeral key {key_id} should appear in search with includeEphemeral=true"
765+
766+
# Verify ephemeral key is excluded from default search (without includeEphemeral)
767+
r_default = requests.post(
768+
f"{api_keys_base_url}/search",
769+
headers=headers,
770+
json={
771+
"filters": {"status": ["active"]},
772+
"pagination": {"limit": 50, "offset": 0},
773+
},
774+
timeout=30,
775+
verify=TLS_VERIFY,
776+
)
777+
assert r_default.status_code == 200
778+
default_items = r_default.json().get("items") or r_default.json().get("data") or []
779+
default_ids = [item["id"] for item in default_items]
780+
assert key_id not in default_ids, \
781+
"Ephemeral key should be excluded from default search (includeEphemeral defaults to false)"
782+
783+
print(f"[cleanup] Ephemeral key visibility verified: visible with filter, hidden by default")
784+
785+
def test_trigger_cleanup_preserves_active_keys(
786+
self, api_keys_base_url: str, headers: dict, deployment_namespace: str,
787+
):
788+
"""Trigger cleanup and verify active ephemeral keys are NOT deleted.
789+
790+
Creates an ephemeral key, triggers cleanup via oc exec into maas-api pod,
791+
and asserts the active key survives cleanup (only expired keys beyond the
792+
30-minute grace period are deleted).
793+
"""
794+
import subprocess as sp
795+
796+
# Create an ephemeral key with 1 hour expiration (won't expire during test)
797+
r = requests.post(
798+
api_keys_base_url,
799+
headers=headers,
800+
json={
801+
"name": "e2e-cleanup-survival-test",
802+
"ephemeral": True,
803+
"expiresIn": "1h",
804+
},
805+
timeout=30,
806+
verify=TLS_VERIFY,
807+
)
808+
assert r.status_code in (200, 201), \
809+
f"Expected 200/201, got {r.status_code}: {r.text}"
810+
key_id = r.json()["id"]
811+
print(f"[cleanup] Created ephemeral key for survival test: id={key_id}")
812+
813+
# Trigger cleanup via oc exec into maas-api pod
814+
# This calls the internal endpoint directly, same as the CronJob does
815+
get_pod = sp.run(
816+
["oc", "get", "pods", "-n", deployment_namespace,
817+
"-l", "app.kubernetes.io/name=maas-api",
818+
"-o", "jsonpath={.items[0].metadata.name}"],
819+
capture_output=True, text=True,
820+
)
821+
if get_pod.returncode != 0 or not get_pod.stdout.strip():
822+
pytest.skip(
823+
f"Cannot find maas-api pod in {deployment_namespace}: "
824+
f"{get_pod.stderr.strip()}"
825+
)
826+
827+
pod_name = get_pod.stdout.strip()
828+
print(f"[cleanup] Triggering cleanup via oc exec into {pod_name}")
829+
830+
cleanup_result = sp.run(
831+
["oc", "exec", pod_name, "-n", deployment_namespace, "--",
832+
"curl", "-sf", "-X", "POST",
833+
"http://localhost:8080/internal/v1/api-keys/cleanup"],
834+
capture_output=True, text=True, timeout=30,
835+
)
836+
837+
if cleanup_result.returncode != 0:
838+
# curl may not be available in the maas-api container; try wget
839+
cleanup_result = sp.run(
840+
["oc", "exec", pod_name, "-n", deployment_namespace, "--",
841+
"wget", "-q", "-O-", "--post-data=",
842+
"http://localhost:8080/internal/v1/api-keys/cleanup"],
843+
capture_output=True, text=True, timeout=30,
844+
)
845+
846+
if cleanup_result.returncode != 0:
847+
pytest.skip(
848+
f"Cannot exec into maas-api pod to trigger cleanup "
849+
f"(neither curl nor wget available): {cleanup_result.stderr.strip()}"
850+
)
851+
852+
import json as _json
853+
cleanup_resp = _json.loads(cleanup_result.stdout)
854+
deleted_count = cleanup_resp.get("deletedCount", -1)
855+
assert deleted_count >= 0, \
856+
f"Cleanup response should have non-negative deletedCount, got: {cleanup_resp}"
857+
print(f"[cleanup] Cleanup completed: deletedCount={deleted_count}, "
858+
f"message={cleanup_resp.get('message')}")
859+
860+
# Verify our active ephemeral key survived cleanup
861+
r_get = requests.get(
862+
f"{api_keys_base_url}/{key_id}",
863+
headers=headers,
864+
timeout=30,
865+
verify=TLS_VERIFY,
866+
)
867+
assert r_get.status_code == 200, \
868+
f"Active ephemeral key {key_id} should survive cleanup, got {r_get.status_code}"
869+
assert r_get.json().get("status") == "active", \
870+
f"Key should still be active after cleanup, got: {r_get.json().get('status')}"
871+
print(f"[cleanup] Active ephemeral key {key_id} survived cleanup (correct behavior)")

0 commit comments

Comments
 (0)