-
Notifications
You must be signed in to change notification settings - Fork 1
feat: nightly purge of obsolete workflow_runs #678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
WorkflowRun
s everydayThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice that you are dealing with huge amount of test data we have in our database. Since this is an important change which deletes lots of data in db can we write some unit and/or integration tests?
Also it's better if we can first come up with dry-run mode where it only logs what's going to be deleted instead of really deleting it. We can first observe the expected behaviour then we can create another PR which really deletes the entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for adding dry run option to this one. since this is an important PR that might delete unwanted entries. Let's observe the logs for following days. 🙏
Motivation
We planed to “delete old test rows” but deleting the parent
workflow_run
instead is simpler:DELETE
onworkflow_run
removes the entiresuite
→case
hierarchy and the matching rows inhelios_deployment
I also added support to have different retention rules for different processing results (PROCESSED, FAILED, NULL).
Description
Config
cleanup.workflow-run
: cron + ordered list of policies.The list is ordered; each run is evaluated by the first policy whose status matches.
Code
WorkflowRunCleanupProps
binds the YAML. Missing age-days → 0 (= “no age filter”).WorkflowRunCleanupTask
runs on a configurable cron, loops over the policies, and callsWorkflowRunRepository.purgeObsoleteRuns()
purgeObsoleteRuns()
is plain SQL:repository + workflow + branch
bucket the query sorts the rows bycreated_at
(newest first) and numbers them 1, 2, 3 … withrow_number()
.≤ keep
is untouchable for that policy. Everything ranked>keep
becomes a candidate for deletion.created_at <= 5
then it will be deleted.Flyway
helios_deployment
andworkflow_run
and recreates it with on delete cascadeBehaviour
With this setup, the cleanup keeps the latest two PROCESSED runs for each (repository, workflow, branch) and deletes any older ones right away. For FAILED and NULL runs, we again keep the latest two, but give older ones a 5-day grace period before deleting them.
Why it's safe to remove old helios_deployment rows
Deployments are tied to
workflow_run
rows (at the first place), and now that we keep the latest two NULL-status (test_process) workflow runs, we’re guaranteed to retain current/in-progress deployment records. These are the only ones needed for UI pages like the deployment history page, which query the deployment table whenever the process is completed. Since we're not deleting everything and we are keeping the latest N records it’s safe to let the database remove olderhelios_deployment
rows when their parent workflow run is deleted.