Skip to content

feat: nightly purge of obsolete workflow_runs #678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 28, 2025

Conversation

egekocabas
Copy link
Member

@egekocabas egekocabas commented Apr 22, 2025

Motivation

We planed to “delete old test rows” but deleting the parent workflow_run instead is simpler:

  • one DELETE on workflow_run removes the entire suitecase hierarchy and the matching rows in helios_deployment
  • much less SQL, no danger of leaving half-orphaned rows.

I also added support to have different retention rules for different processing results (PROCESSED, FAILED, NULL).

Description

Config

  • New YAML section cleanup.workflow-run: cron + ordered list of policies.
cleanup:
  workflow-run:
    cron: "0 0 1 * * *"         # default: every night 01:00 server time
    policies:
      # 1️⃣  keep the latest 2 runs,  delete any other ones
      - test-processing-status: PROCESSED
        keep: 2

      # 2️⃣  keep the latest 2 runs, delete older ones after 5 days
      - test-processing-status: FAILED
        keep: 2
        age-days: 5

      # 3️⃣  keep the latest 2 runs with NULL status, delete older ones after 5 days
      - test-processing-status: null
        keep: 2
        age-days: 5

The list is ordered; each run is evaluated by the first policy whose status matches.

Code

  • WorkflowRunCleanupProps binds the YAML. Missing age-days → 0 (= “no age filter”).
  • WorkflowRunCleanupTask runs on a configurable cron, loops over the policies, and calls WorkflowRunRepository.purgeObsoleteRuns()
  • purgeObsoleteRuns() is plain SQL:
    • Rank the runs.
      • Inside each repository + workflow + branch bucket the query sorts the rows by created_at (newest first) and numbers them 1, 2, 3 … with row_number().
    • Protect the newest N.
      • Any row whose rank is ≤ keep is untouchable for that policy. Everything ranked >keep becomes a candidate for deletion.
    • Check the age
      • If age-days: 0 or left the key out, it will be deleted.
      • If age-days: 5 (for example), if the created_at <= 5 then it will be deleted.
    • Thanks to the foreign keys, deleting a run also deletes its test data and helios_deployments.

Flyway

  • drops the old FK between helios_deployment and workflow_run and recreates it with on delete cascade

Behaviour

With this setup, the cleanup keeps the latest two PROCESSED runs for each (repository, workflow, branch) and deletes any older ones right away. For FAILED and NULL runs, we again keep the latest two, but give older ones a 5-day grace period before deleting them.

Why it's safe to remove old helios_deployment rows

Deployments are tied to workflow_run rows (at the first place), and now that we keep the latest two NULL-status (test_process) workflow runs, we’re guaranteed to retain current/in-progress deployment records. These are the only ones needed for UI pages like the deployment history page, which query the deployment table whenever the process is completed. Since we're not deleting everything and we are keeping the latest N records it’s safe to let the database remove older helios_deployment rows when their parent workflow run is deleted.

Copy link

codacy-production bot commented Apr 22, 2025

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-0.06% (target: -1.00%) 0.00%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (2feef20) 6842 907 13.26%
Head commit (50dcae6) 6875 (+33) 907 (+0) 13.19% (-0.06%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#678) 33 0 0.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@egekocabas egekocabas changed the title feat: cleanup WorkflowRuns everyday feat: purge obsolete workflow_runs Apr 23, 2025
@egekocabas egekocabas changed the title feat: purge obsolete workflow_runs feat: nightly purge of obsolete workflow_runs Apr 23, 2025
@egekocabas egekocabas marked this pull request as ready for review April 23, 2025 17:03
@egekocabas egekocabas requested a review from a team as a code owner April 23, 2025 17:03
Copy link
Member

@TurkerKoc TurkerKoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that you are dealing with huge amount of test data we have in our database. Since this is an important change which deletes lots of data in db can we write some unit and/or integration tests?

Also it's better if we can first come up with dry-run mode where it only logs what's going to be deleted instead of really deleting it. We can first observe the expected behaviour then we can create another PR which really deletes the entries.

@egekocabas egekocabas marked this pull request as draft April 28, 2025 00:25
@TurkerKoc TurkerKoc marked this pull request as ready for review April 28, 2025 12:02
Copy link
Member

@TurkerKoc TurkerKoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding dry run option to this one. since this is an important PR that might delete unwanted entries. Let's observe the logs for following days. 🙏

@TurkerKoc TurkerKoc merged commit 2c2bc8b into staging Apr 28, 2025
17 checks passed
@TurkerKoc TurkerKoc deleted the feat/scheduler-for-tests branch April 28, 2025 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a scheduler for database clean up for test results
2 participants