Skip to content

fix: consolidate garbage collection for namespaced and cluster-scoped workspaces#124

Merged
ulucinar merged 1 commit intoupbound:mainfrom
erhancagirici:singleton-gc
Feb 24, 2026
Merged

fix: consolidate garbage collection for namespaced and cluster-scoped workspaces#124
ulucinar merged 1 commit intoupbound:mainfrom
erhancagirici:singleton-gc

Conversation

@erhancagirici
Copy link
Copy Markdown
Contributor

Description of your changes

Fixes the bug in the garbage collector that caused unintended workspace directory deletions.

Previously, both cluster-scoped and namespaced workspace controllers started their own GC instances on the same root directory. Each GC only checked their respective Workspace MR instances, causing race and deletions of workspace dirs of each other, e.g. cluster-scoped GC deleting workspace directories and vice versa.

The fix consolidates the GC logic to consider both Workspace MR types and runs a centralized GC controller, per root directory ( e.g. /tofu and /tmp/tofu) It also considers edge cases due to potential usage of SafeStart, where namespaced or cluster-scoped Workspace CRDs might not be available immediately or not used at all.

  • GC logic is now controller-runtime manager.Runnable with proper shutdown context
  • centralized gc.Setup() with ensuring "run once", whichever controller
  • Added tests for CRD gating scenarios

Alternatives considered

Separating root workspace directories could also worked, however this was considered as partially breaking change.
Also, they can still potentially share the same root dir if configured via the env var XP_TF_DIR, which would result in the race again.

I have:

  • Run make reviewable to ensure this PR is ready for review.

How has this code been tested

added new unit tests and existing passes.

Tested manually using the following scenario:

  • Create several workspace MRs, both namespaced and cluster-scoped

  • Wait for GC to kick

  • Ensure no workspace directory is deleted at /tofu dir in pod.

  • Now, delete some workspace MRs with orphaning (their workspace dirs continue to exist)

  • Wait for GC to kick

  • Ensure they are garbage collected (no regressions)

… workspaces

Fixes the bug in the garbage collector that caused unintended workspace
directory deletions.

Previously, both cluster-scoped and namespaced workspace
controllers started their own GC instances on the same root directory.
Each GC only checked their respective Workspace MR instances, causing race and deletions of workspace dirs of each other, e.g. cluster-scoped GC deleting workspace directories and vice versa.

The fix consolidates the GC logic to consider both Workspace MR types and runs a centralized GC controller, per root directory ( e.g. `/tofu` and `/tmp/tofu`)
It also considers edge cases due to potential usage of SafeStart, where namespaced or cluster-scoped Workspace CRDs might not be available immediately or not used at all.

- GC logic is now  controller-runtime `manager.Runnable` with proper shutdown context
- centralized gc.Setup() with ensuring "run once", whichever controller
- Added tests for CRD gating scenarios

Signed-off-by: Erhan Cagirici <erhan@upbound.io>
Copy link
Copy Markdown
Contributor

@ulucinar ulucinar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @erhancagirici for the fix. These changes have been reviewed here.

@ulucinar
Copy link
Copy Markdown
Contributor

/test-examples="examples/cluster/workspace-inline-aws.yaml"

@ulucinar ulucinar merged commit db9d7b1 into upbound:main Feb 24, 2026
9 checks passed
@github-actions
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants