Problem
The clean-install-smoke gate (artifact-keeper-test#49) and the deploy job both create namespaces (smoke-*, test-*) per workflow run. The script's EXIT trap removes them on success/failure, but:
- SIGKILL (workflow timeout) does not fire the trap
helm uninstall failures with stuck finalizers can leave PVCs orphaned
kubectl delete namespace --wait=false returns before the namespace actually terminates
Over time these accumulate on Rocky and consume PVC storage / occupy ResourceQuota.
Acceptance criteria
- CronJob in
arc-runners namespace that runs hourly
- Lists namespaces matching
smoke-* or test-* with metadata.creationTimestamp older than 1 hour
- Deletes them with proper finalizer handling (force-clear if stuck)
- Reaps any orphaned PVCs with names matching the same pattern
- Emits Prometheus metric
ak_runner_janitor_namespaces_reaped_total and ak_runner_janitor_pvcs_reaped_total so we can tell if it's actually running
Tracking
Phase 2 of Hardening Core. Follows from senior review on artifact-keeper-test#49.
Problem
The
clean-install-smokegate (artifact-keeper-test#49) and thedeployjob both create namespaces (smoke-*,test-*) per workflow run. The script's EXIT trap removes them on success/failure, but:helm uninstallfailures with stuck finalizers can leave PVCs orphanedkubectl delete namespace --wait=falsereturns before the namespace actually terminatesOver time these accumulate on Rocky and consume PVC storage / occupy ResourceQuota.
Acceptance criteria
arc-runnersnamespace that runs hourlysmoke-*ortest-*withmetadata.creationTimestampolder than 1 hourak_runner_janitor_namespaces_reaped_totalandak_runner_janitor_pvcs_reaped_totalso we can tell if it's actually runningTracking
Phase 2 of Hardening Core. Follows from senior review on artifact-keeper-test#49.