Skip to content

Retention Policy Agent: what is the vision for log storage (File/S3/GCS) cleanup? #1231

Description

@SergK

/kind feature

Context

The retention policy agent was introduced in PR #784 explicitly scoped to DB cleanup.
PR #1182 and issue #1181 later clarified in docs that the TTL applies only to DB records.

Meanwhile, issue #459 had a brief exchange noting the retention feature is "more targeted
in deleting the data from the database and/or S3", but this was never followed up on.

Today, when the retention agent expires a Result:

  • The results and records rows are deleted from Postgres
  • Log data stored on persistent volumes (File backend), S3 buckets, or GCS buckets
    is left behind and becomes orphaned

Over time this causes unbounded storage growth that operators have no built-in way to
address.

Questions

  1. Is there an ongoing effort or design discussion about cleaning up log storage
    during retention? I didn't find any issue or TEP tracking this specifically.

  2. What is the intended architecture? There seem to be a few directions:

    • Should the retention agent call the existing DeleteLog gRPC RPC (per-log, through
      the API server)?
    • Should a new batch-oriented API be added to the Logs service (e.g. DeleteLogs or
      prefix-based deletion)?
    • Should the retention agent bypass the API and talk to storage backends directly
      (it already has the server config)?
    • Or is log storage cleanup considered out of scope for the retention agent, with the
      expectation that operators handle it externally (e.g. S3 lifecycle policies, cron
      scripts)?
  3. Is the DeleteResult cascade intended to cover this? Currently DeleteResult
    on the API server only cascade-deletes DB records (via foreign keys). It does not
    touch the underlying log storage. Is there a plan to change this behavior?

  4. Backend-specific considerations: The three backends have very different batch
    capabilities (filesystem RemoveAll on directories, S3 DeleteObjects for up to
    1000 keys, GCS prefix listing). Has there been any thought on how to unify this
    behind the Stream interface or whether each backend needs its own bulk-delete path?

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Fields

    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions