/kind feature
Context
The retention policy agent was introduced in PR #784 explicitly scoped to DB cleanup.
PR #1182 and issue #1181 later clarified in docs that the TTL applies only to DB records.
Meanwhile, issue #459 had a brief exchange noting the retention feature is "more targeted
in deleting the data from the database and/or S3", but this was never followed up on.
Today, when the retention agent expires a Result:
- The
results and records rows are deleted from Postgres
- Log data stored on persistent volumes (File backend), S3 buckets, or GCS buckets
is left behind and becomes orphaned
Over time this causes unbounded storage growth that operators have no built-in way to
address.
Questions
-
Is there an ongoing effort or design discussion about cleaning up log storage
during retention? I didn't find any issue or TEP tracking this specifically.
-
What is the intended architecture? There seem to be a few directions:
- Should the retention agent call the existing
DeleteLog gRPC RPC (per-log, through
the API server)?
- Should a new batch-oriented API be added to the Logs service (e.g.
DeleteLogs or
prefix-based deletion)?
- Should the retention agent bypass the API and talk to storage backends directly
(it already has the server config)?
- Or is log storage cleanup considered out of scope for the retention agent, with the
expectation that operators handle it externally (e.g. S3 lifecycle policies, cron
scripts)?
-
Is the DeleteResult cascade intended to cover this? Currently DeleteResult
on the API server only cascade-deletes DB records (via foreign keys). It does not
touch the underlying log storage. Is there a plan to change this behavior?
-
Backend-specific considerations: The three backends have very different batch
capabilities (filesystem RemoveAll on directories, S3 DeleteObjects for up to
1000 keys, GCS prefix listing). Has there been any thought on how to unify this
behind the Stream interface or whether each backend needs its own bulk-delete path?
/kind feature
Context
The retention policy agent was introduced in PR #784 explicitly scoped to DB cleanup.
PR #1182 and issue #1181 later clarified in docs that the TTL applies only to DB records.
Meanwhile, issue #459 had a brief exchange noting the retention feature is "more targeted
in deleting the data from the database and/or S3", but this was never followed up on.
Today, when the retention agent expires a Result:
resultsandrecordsrows are deleted from Postgresis left behind and becomes orphaned
Over time this causes unbounded storage growth that operators have no built-in way to
address.
Questions
Is there an ongoing effort or design discussion about cleaning up log storage
during retention? I didn't find any issue or TEP tracking this specifically.
What is the intended architecture? There seem to be a few directions:
DeleteLoggRPC RPC (per-log, throughthe API server)?
DeleteLogsorprefix-based deletion)?
(it already has the server config)?
expectation that operators handle it externally (e.g. S3 lifecycle policies, cron
scripts)?
Is the
DeleteResultcascade intended to cover this? CurrentlyDeleteResulton the API server only cascade-deletes DB records (via foreign keys). It does not
touch the underlying log storage. Is there a plan to change this behavior?
Backend-specific considerations: The three backends have very different batch
capabilities (filesystem
RemoveAllon directories, S3DeleteObjectsfor up to1000 keys, GCS prefix listing). Has there been any thought on how to unify this
behind the
Streaminterface or whether each backend needs its own bulk-delete path?