fix: consolidate garbage collection for namespaced and cluster-scoped workspaces#124
Merged
ulucinar merged 1 commit intoupbound:mainfrom Feb 24, 2026
Merged
Conversation
… workspaces Fixes the bug in the garbage collector that caused unintended workspace directory deletions. Previously, both cluster-scoped and namespaced workspace controllers started their own GC instances on the same root directory. Each GC only checked their respective Workspace MR instances, causing race and deletions of workspace dirs of each other, e.g. cluster-scoped GC deleting workspace directories and vice versa. The fix consolidates the GC logic to consider both Workspace MR types and runs a centralized GC controller, per root directory ( e.g. `/tofu` and `/tmp/tofu`) It also considers edge cases due to potential usage of SafeStart, where namespaced or cluster-scoped Workspace CRDs might not be available immediately or not used at all. - GC logic is now controller-runtime `manager.Runnable` with proper shutdown context - centralized gc.Setup() with ensuring "run once", whichever controller - Added tests for CRD gating scenarios Signed-off-by: Erhan Cagirici <erhan@upbound.io>
f7dfac3 to
c900862
Compare
ulucinar
approved these changes
Feb 23, 2026
Contributor
ulucinar
left a comment
There was a problem hiding this comment.
Thanks @erhancagirici for the fix. These changes have been reviewed here.
Contributor
|
/test-examples="examples/cluster/workspace-inline-aws.yaml" |
|
Successfully created backport PR for |
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of your changes
Fixes the bug in the garbage collector that caused unintended workspace directory deletions.
Previously, both cluster-scoped and namespaced workspace controllers started their own GC instances on the same root directory. Each GC only checked their respective Workspace MR instances, causing race and deletions of workspace dirs of each other, e.g. cluster-scoped GC deleting workspace directories and vice versa.
The fix consolidates the GC logic to consider both Workspace MR types and runs a centralized GC controller, per root directory ( e.g.
/tofuand/tmp/tofu) It also considers edge cases due to potential usage of SafeStart, where namespaced or cluster-scoped Workspace CRDs might not be available immediately or not used at all.manager.Runnablewith proper shutdown contextAlternatives considered
Separating root workspace directories could also worked, however this was considered as partially breaking change.
Also, they can still potentially share the same root dir if configured via the env var
XP_TF_DIR, which would result in the race again.I have:
make reviewableto ensure this PR is ready for review.How has this code been tested
added new unit tests and existing passes.
Tested manually using the following scenario:
Create several workspace MRs, both namespaced and cluster-scoped
Wait for GC to kick
Ensure no workspace directory is deleted at
/tofudir in pod.Now, delete some workspace MRs with orphaning (their workspace dirs continue to exist)
Wait for GC to kick
Ensure they are garbage collected (no regressions)