Retention Policy Agent: what is the vision for log storage (File/S3/GCS) cleanup?

/kind feature

## Context

The retention policy agent was introduced in PR #784 explicitly scoped to DB cleanup.
PR #1182 and issue #1181 later clarified in docs that the TTL applies only to DB records.

Meanwhile, issue #459 had a brief exchange noting the retention feature is "more targeted
in deleting the data from the database **and/or S3**", but this was never followed up on.

Today, when the retention agent expires a Result:
- The `results` and `records` rows are deleted from Postgres
- Log **data** stored on persistent volumes (File backend), S3 buckets, or GCS buckets
  is left behind and becomes orphaned

Over time this causes unbounded storage growth that operators have no built-in way to
address.

## Questions

1. **Is there an ongoing effort or design discussion** about cleaning up log storage
   during retention? I didn't find any issue or TEP tracking this specifically.

2. **What is the intended architecture?** There seem to be a few directions:
   - Should the retention agent call the existing `DeleteLog` gRPC RPC (per-log, through
     the API server)?
   - Should a new batch-oriented API be added to the Logs service (e.g. `DeleteLogs` or
     prefix-based deletion)?
   - Should the retention agent bypass the API and talk to storage backends directly
     (it already has the server config)?
   - Or is log storage cleanup considered out of scope for the retention agent, with the
     expectation that operators handle it externally (e.g. S3 lifecycle policies, cron
     scripts)?

3. **Is the `DeleteResult` cascade intended to cover this?** Currently `DeleteResult`
   on the API server only cascade-deletes DB records (via foreign keys). It does not
   touch the underlying log storage. Is there a plan to change this behavior?

4. **Backend-specific considerations:** The three backends have very different batch
   capabilities (filesystem `RemoveAll` on directories, S3 `DeleteObjects` for up to
   1000 keys, GCS prefix listing). Has there been any thought on how to unify this
   behind the `Stream` interface or whether each backend needs its own bulk-delete path?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retention Policy Agent: what is the vision for log storage (File/S3/GCS) cleanup? #1231

Context

Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Retention Policy Agent: what is the vision for log storage (File/S3/GCS) cleanup? #1231

Description

Context

Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions