Skip to content

[V2] Add TaskAction garbage collector for terminal CRDs #6995

@pingsutw

Description

@pingsutw

Problem

Terminal TaskAction CRDs (Succeeded/Failed) remain in the cluster indefinitely after completion. This wastes etcd storage and slows down list operations, especially as the number of completed tasks grows over time.

Proposed Solution

Add a label-based TTL garbage collector for terminal TaskActions, modeled after the propeller FlyteWorkflow GC (flytepropeller/pkg/controller/garbage_collector.go).

Design

  1. Terminal labeling: When a TaskAction reaches a terminal state (Succeeded/Failed), stamp it with flyte.org/termination-status=terminated and flyte.org/completed-time=<UTC hour> labels
  2. Background GC loop: A manager.Runnable that periodically lists terminated TaskActions, filters by completed-time label (lexicographically ordered), and deletes expired ones
  3. Configuration: GCConfig with Interval (how often GC runs) and MaxTTL (time-to-live for terminal TaskActions)

Key Design Decisions

  • List + filter + delete (not DeleteAllOf): K8s label selectors don't support "less than" on string values, so we list all terminated TaskActions and filter client-side by hour label
  • Separate metadata update: Labels require r.Update() (not r.Status().Update()). One extra API call per terminal transition, but only once per TaskAction lifetime
  • Upgrade path: Pre-existing terminal TaskActions get labels on next reconcile via the terminal short-circuit path

Implementation

PR: #6994

Metadata

Metadata

Assignees

Labels

addedMerged changes that add new functionalityflyte2

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions