feat: nightly purge of obsolete workflow_runs #678

egekocabas · 2025-04-22T23:52:11Z

Motivation

We planed to “delete old test rows” but deleting the parent workflow_run instead is simpler:

one DELETE on workflow_run removes the entire suite → case hierarchy and the matching rows in helios_deployment
much less SQL, no danger of leaving half-orphaned rows.

I also added support to have different retention rules for different processing results (PROCESSED, FAILED, NULL).

Description

Config

New YAML section cleanup.workflow-run: cron + ordered list of policies.

cleanup:
  workflow-run:
    cron: "0 0 1 * * *"         # default: every night 01:00 server time
    policies:
      # 1️⃣  keep the latest 2 runs,  delete any other ones
      - test-processing-status: PROCESSED
        keep: 2

      # 2️⃣  keep the latest 2 runs, delete older ones after 5 days
      - test-processing-status: FAILED
        keep: 2
        age-days: 5

      # 3️⃣  keep the latest 2 runs with NULL status, delete older ones after 5 days
      - test-processing-status: null
        keep: 2
        age-days: 5

The list is ordered; each run is evaluated by the first policy whose status matches.

Code

WorkflowRunCleanupProps binds the YAML. Missing age-days → 0 (= “no age filter”).
WorkflowRunCleanupTask runs on a configurable cron, loops over the policies, and calls WorkflowRunRepository.purgeObsoleteRuns()
purgeObsoleteRuns() is plain SQL:
- Rank the runs.
  - Inside each repository + workflow + branch bucket the query sorts the rows by created_at (newest first) and numbers them 1, 2, 3 … with row_number().
- Protect the newest N.
  - Any row whose rank is ≤ keep is untouchable for that policy. Everything ranked >keep becomes a candidate for deletion.
- Check the age
  - If age-days: 0 or left the key out, it will be deleted.
  - If age-days: 5 (for example), if the created_at <= 5 then it will be deleted.
- Thanks to the foreign keys, deleting a run also deletes its test data and helios_deployments.

Flyway

drops the old FK between helios_deployment and workflow_run and recreates it with on delete cascade

Behaviour

With this setup, the cleanup keeps the latest two PROCESSED runs for each (repository, workflow, branch) and deletes any older ones right away. For FAILED and NULL runs, we again keep the latest two, but give older ones a 5-day grace period before deleting them.

Why it's safe to remove old helios_deployment rows

Deployments are tied to workflow_run rows (at the first place), and now that we keep the latest two NULL-status (test_process) workflow runs, we’re guaranteed to retain current/in-progress deployment records. These are the only ones needed for UI pages like the deployment history page, which query the deployment table whenever the process is completed. Since we're not deleting everything and we are keeping the latest N records it’s safe to let the database remove older helios_deployment rows when their parent workflow run is deleted.

codacy-production · 2025-04-22T23:53:35Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ -0.06% (target: -1.00%)	✅ 0.00%

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`2feef20`)	6842	907	13.26%
Head commit (`50dcae6`)	6875 (+33)	907 (+0)	13.19% (-0.06%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#678)	33	0	0.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

TurkerKoc

Nice that you are dealing with huge amount of test data we have in our database. Since this is an important change which deletes lots of data in db can we write some unit and/or integration tests?

Also it's better if we can first come up with dry-run mode where it only logs what's going to be deleted instead of really deleting it. We can first observe the expected behaviour then we can create another PR which really deletes the entries.

TurkerKoc

LGTM! Thanks for adding dry run option to this one. since this is an important PR that might delete unwanted entries. Let's observe the logs for following days. 🙏

feat: cleanup WorkflowRuns everyday

7e92af8

github-actions bot assigned egekocabas Apr 22, 2025

github-actions bot added application-server feature size:L labels Apr 22, 2025

egekocabas linked an issue Apr 22, 2025 that may be closed by this pull request

Create a scheduler for database clean up for test results #651

Closed

fix

d3d72b3

github-actions bot added the migration-script label Apr 23, 2025

fix

147aa1f

egekocabas changed the title ~~feat: cleanup WorkflowRuns everyday~~ feat: purge obsolete workflow_runs Apr 23, 2025

egekocabas changed the title ~~feat: purge obsolete workflow_runs~~ feat: nightly purge of obsolete workflow_runs Apr 23, 2025

egekocabas marked this pull request as ready for review April 23, 2025 17:03

egekocabas requested a review from a team as a code owner April 23, 2025 17:03

github-actions bot added the ready for review label Apr 23, 2025

Merge branch 'staging' into feat/scheduler-for-tests

f2fd6fc

TurkerKoc requested changes Apr 27, 2025

View reviewed changes

egekocabas and others added 3 commits April 28, 2025 02:23

add dry run

b14b1fe

Merge branch 'staging' into feat/scheduler-for-tests

b3cd231

reversion

fdcb2b8

egekocabas marked this pull request as draft April 28, 2025 00:25

egekocabas removed the ready for review label Apr 28, 2025

egekocabas added 2 commits April 28, 2025 02:29

add log when ApplicationReadyEvent

468746d

log wording

50dcae6

TurkerKoc marked this pull request as ready for review April 28, 2025 12:02

github-actions bot added the ready for review label Apr 28, 2025

TurkerKoc approved these changes Apr 28, 2025

View reviewed changes

TurkerKoc merged commit 2c2bc8b into staging Apr 28, 2025
17 checks passed

TurkerKoc deleted the feat/scheduler-for-tests branch April 28, 2025 12:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: nightly purge of obsolete workflow_runs #678

feat: nightly purge of obsolete workflow_runs #678

Uh oh!

egekocabas commented Apr 22, 2025 •

edited

Loading

Uh oh!

codacy-production bot commented Apr 22, 2025 •

edited

Loading

Uh oh!

TurkerKoc left a comment

Uh oh!

TurkerKoc left a comment

Uh oh!

Uh oh!

Uh oh!

feat: nightly purge of obsolete workflow_runs #678

feat: nightly purge of obsolete workflow_runs #678

Uh oh!

Conversation

egekocabas commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Config

Code

Flyway

Behaviour

Why it's safe to remove old helios_deployment rows

Uh oh!

codacy-production bot commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

Uh oh!

TurkerKoc left a comment

Choose a reason for hiding this comment

Uh oh!

TurkerKoc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

egekocabas commented Apr 22, 2025 •

edited

Loading

codacy-production bot commented Apr 22, 2025 •

edited

Loading