Skip to content

fix: workspace directories are being deleted every hour#123

Closed
maratkomarov wants to merge 1 commit intoupbound:mainfrom
maratkomarov:fix-gc-race
Closed

fix: workspace directories are being deleted every hour#123
maratkomarov wants to merge 1 commit intoupbound:mainfrom
maratkomarov:fix-gc-race

Conversation

@maratkomarov
Copy link
Copy Markdown

@maratkomarov maratkomarov commented Feb 12, 2026

Description of your changes

It all started with observing very long reconciliation loops in the OpenTofu workspaces. We have approximately 200 resources, and it takes about 1h to reconcile them all. Resources rarely change. So the provider observation should run tofu plan, see no change, and finish. What we see instead is that almost every time the provider runs tofu init, because it thinks that the workspace directory checksum has changed.

On closer examination, I found that the workspace directories periodically disappear. We mount a persistent volume at /tofu to preserve content if the provider pod restarts. Moreover, the provider pod has multi-hour uptime, but workspace directories consistently disappear every 1 hour or so.

I delved into the source code and found that the provider creates 2 workspace controllers: cluster and namespaced. Both controllers start garbage collectors: cluster, namespaced. Collectors run the same function, with the only difference being the namespaced value: true | false. The flag determines, which resource type, the collect() function will list: clusterv1beta1.Workspace or namespacedv1beta1.Workspace. Then the function lists the workspace directories and deletes those no longer associated with the existing workspace.

The problem is that both cluster and namespaced controllers store their workspaces in the same place: /tofu.

This causes the namespaced garbage collector to delete directories owned by a cluster and vice versa.


The fix is to give each resource type a separate folder:

  • $XP_TF_DIR/cluster - cluster workspaces
  • $XP_TF_DIR/namespaced - namespaced workspaces

I have:

  • Run make reviewable to ensure this PR is ready for review.

How has this code been tested

Passed the unit test suite.

Built a provider and validated that it works in our test environment as expected.

@Upbound-CLA
Copy link
Copy Markdown

Upbound-CLA commented Feb 12, 2026

CLA assistant check
All committers have signed the CLA.

@erhancagirici
Copy link
Copy Markdown
Contributor

@maratkomarov many thanks for the analysis and the PR! Similar issue was also discovered in provider-terraform as well. The solution in this PR was also considered. While it is valid and resolves the issue, we wanted to avoid a potential breaking change regarding the directory structure, in case consumers rely on them externally.

#124 keeps the directory structure as-is, and got merged, so closing this in favor of it. Again, thanks for the PR anyways!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants