Skip to content

ARC: periodic janitor for stale smoke-* / test-* namespaces and orphaned PVCs #81

@brandonrc

Description

@brandonrc

Problem

The clean-install-smoke gate (artifact-keeper-test#49) and the deploy job both create namespaces (smoke-*, test-*) per workflow run. The script's EXIT trap removes them on success/failure, but:

  • SIGKILL (workflow timeout) does not fire the trap
  • helm uninstall failures with stuck finalizers can leave PVCs orphaned
  • kubectl delete namespace --wait=false returns before the namespace actually terminates

Over time these accumulate on Rocky and consume PVC storage / occupy ResourceQuota.

Acceptance criteria

  • CronJob in arc-runners namespace that runs hourly
  • Lists namespaces matching smoke-* or test-* with metadata.creationTimestamp older than 1 hour
  • Deletes them with proper finalizer handling (force-clear if stuck)
  • Reaps any orphaned PVCs with names matching the same pattern
  • Emits Prometheus metric ak_runner_janitor_namespaces_reaped_total and ak_runner_janitor_pvcs_reaped_total so we can tell if it's actually running

Tracking

Phase 2 of Hardening Core. Follows from senior review on artifact-keeper-test#49.

Metadata

Metadata

Assignees

No one assigned

    Labels

    hardeningHardening Core: stability and process workv1.2.0Targeted for v1.2.0 release

    Type

    No type

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions