Skip to content

feat: nightly purge of obsolete workflow_runs #678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: staging
Choose a base branch
from

Conversation

egekocabas
Copy link
Member

@egekocabas egekocabas commented Apr 22, 2025

Motivation

We planed to “delete old test rows” but deleting the parent workflow_run instead is simpler:

  • one DELETE on workflow_run removes the entire suitecase hierarchy and the matching rows in helios_deployment
  • much less SQL, no danger of leaving half-orphaned rows.

I also added support to have different retention rules for different processing results (PROCESSED, FAILED, NULL).

Description

Config

  • New YAML section cleanup.workflow-run: cron + ordered list of policies.
cleanup:
  workflow-run:
    cron: "0 0 1 * * *"         # default: every night 01:00 server time
    policies:
      # 1️⃣  keep the latest 2 runs,  delete any other ones
      - test-processing-status: PROCESSED
        keep: 2

      # 2️⃣  keep the latest 2 runs, delete older ones after 5 days
      - test-processing-status: FAILED
        keep: 2
        age-days: 5

      # 3️⃣  keep the latest 2 runs with NULL status, delete older ones after 5 days
      - test-processing-status: null
        keep: 2
        age-days: 5

The list is ordered; each run is evaluated by the first policy whose status matches.

Code

  • WorkflowRunCleanupProps binds the YAML. Missing age-days → 0 (= “no age filter”).
  • WorkflowRunCleanupTask runs on a configurable cron, loops over the policies, and calls WorkflowRunRepository.purgeObsoleteRuns()
  • purgeObsoleteRuns() is plain SQL:
    • Rank the runs.
      • Inside each repository + workflow + branch bucket the query sorts the rows by created_at (newest first) and numbers them 1, 2, 3 … with row_number().
    • Protect the newest N.
      • Any row whose rank is ≤ keep is untouchable for that policy. Everything ranked >keep becomes a candidate for deletion.
    • Check the age
      • If age-days: 0 or left the key out, it will be deleted.
      • If age-days: 5 (for example), if the created_at <= 5 then it will be deleted.
    • Thanks to the foreign keys, deleting a run also deletes its test data and helios_deployments.

Flyway

  • drops the old FK between helios_deployment and workflow_run and recreates it with on delete cascade

Behaviour

With this setup, the cleanup keeps the latest two PROCESSED runs for each (repository, workflow, branch) and deletes any older ones right away. For FAILED and NULL runs, we again keep the latest two, but give older ones a 5-day grace period before deleting them.

Why it's safe to remove old helios_deployment rows

Deployments are tied to workflow_run rows (at the first place), and now that we keep the latest two NULL-status (test_process) workflow runs, we’re guaranteed to retain current/in-progress deployment records. These are the only ones needed for UI pages like the deployment history page, which query the deployment table whenever the process is completed. Since we're not deleting everything and we are keeping the latest N records it’s safe to let the database remove older helios_deployment rows when their parent workflow run is deleted.

Copy link

codacy-production bot commented Apr 22, 2025

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-0.03% (target: -1.00%) 0.00%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (5f7e9b8) 6809 896 13.16%
Head commit (f2fd6fc) 6826 (+17) 896 (+0) 13.13% (-0.03%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#678) 17 0 0.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@egekocabas egekocabas changed the title feat: cleanup WorkflowRuns everyday feat: purge obsolete workflow_runs Apr 23, 2025
@egekocabas egekocabas changed the title feat: purge obsolete workflow_runs feat: nightly purge of obsolete workflow_runs Apr 23, 2025
@egekocabas egekocabas marked this pull request as ready for review April 23, 2025 17:03
@egekocabas egekocabas requested a review from a team as a code owner April 23, 2025 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a scheduler for database clean up for test results
1 participant