fix: workspace directories are being deleted every hour#123
Closed
maratkomarov wants to merge 1 commit intoupbound:mainfrom
Closed
fix: workspace directories are being deleted every hour#123maratkomarov wants to merge 1 commit intoupbound:mainfrom
maratkomarov wants to merge 1 commit intoupbound:mainfrom
Conversation
Contributor
|
@maratkomarov many thanks for the analysis and the PR! Similar issue was also discovered in provider-terraform as well. The solution in this PR was also considered. While it is valid and resolves the issue, we wanted to avoid a potential breaking change regarding the directory structure, in case consumers rely on them externally. #124 keeps the directory structure as-is, and got merged, so closing this in favor of it. Again, thanks for the PR anyways! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of your changes
It all started with observing very long reconciliation loops in the OpenTofu workspaces. We have approximately 200 resources, and it takes about 1h to reconcile them all. Resources rarely change. So the provider observation should run
tofu plan, see no change, and finish. What we see instead is that almost every time the provider runstofu init, because it thinks that the workspace directory checksum has changed.On closer examination, I found that the workspace directories periodically disappear. We mount a persistent volume at
/tofuto preserve content if the provider pod restarts. Moreover, the provider pod has multi-hour uptime, but workspace directories consistently disappear every 1 hour or so.I delved into the source code and found that the provider creates 2 workspace controllers: cluster and namespaced. Both controllers start garbage collectors: cluster, namespaced. Collectors run the same function, with the only difference being the namespaced value: true | false. The flag determines, which resource type, the collect() function will list: clusterv1beta1.Workspace or namespacedv1beta1.Workspace. Then the function lists the workspace directories and deletes those no longer associated with the existing workspace.
The problem is that both cluster and namespaced controllers store their workspaces in the same place: /tofu.
This causes the namespaced garbage collector to delete directories owned by a cluster and vice versa.
The fix is to give each resource type a separate folder:
$XP_TF_DIR/cluster- cluster workspaces$XP_TF_DIR/namespaced- namespaced workspacesI have:
make reviewableto ensure this PR is ready for review.How has this code been tested
Passed the unit test suite.
Built a provider and validated that it works in our test environment as expected.