-
Notifications
You must be signed in to change notification settings - Fork 253
Change GitRepo deletion. #4361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Change GitRepo deletion. #4361
Conversation
a24d029 to
fcc8657
Compare
This PR introduces changes to how GitRepos are deleted, as well as their dependent resources. Two main changes are introduced: * The function that deletes the GitRepo has been modified to check for any pending resources, especially *Bundles* and *ImageScans*, since previously the *GitRepo* was removed before these resources were fully deleted. * A new controller is introduced for *Content* resources to track the number of *BundleDeployments* that reference the same one. This controller aims to eliminate the conflicts that occurred when updating or deleting *BundleDeployments*, as they were being updated or removed concurrently while modifying a shared *Content* resource. With these changes, the goal is to eliminate both update conflicts (which triggered additional Reconcile calls in the controllers) and potential race conditions when deleting and recreating *GitRepos* that still had dependent resources pending deletion. Signed-off-by: Xavi Garcia <[email protected]>
fcc8657 to
f59b831
Compare
weyfonk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels cleaner than the previous approach, if slightly more "manual".
Finalizers were not the right tool for the job, it seems :)
| // resource, so the downstream agent can deploy them. | ||
| // +nullable | ||
| Content []byte `json:"content,omitempty"` | ||
| Content []byte `json:"content,omitempty"` // Content is a byte array, which contains the manifests of a bundle. The bundle resources are copied into the bundledeployment's content resource, so the downstream agent can deploy them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this second sentence needed here? It already appears on L27-28.
| workersOpts.Schedule = w | ||
| } | ||
|
|
||
| if d := os.Getenv("CONTENT_RECONCILER_WORKERS"); d != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This env var should probably be set in the fleet-controller deployment.
| func GetNamespaceAndNameID(namespace, name string) string { | ||
| return fmt.Sprintf("%s/%s", namespace, name) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we get rid of the commented out code above, then this function is no longer needed.
| return nil // continue iterating no matter what | ||
| }) | ||
|
|
||
| return errors.Join(errs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will format errors with a newline between each error message (skipping nil errors). I'm not sure how well that would work in the Rancher UI.
This may not be critical, but perhaps something to test.
| ShardID: shardID, | ||
| Workers: workersOpts.Content, | ||
| }).SetupWithManager(mgr); err != nil { | ||
| setupLog.Error(err, "unable to content controller", "controller", "Content") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This typo is very relatable :D
| setupLog.Error(err, "unable to content controller", "controller", "Content") | |
| setupLog.Error(err, "unable to create controller", "controller", "Content") |
| bd.Spec.ValuesHash = valuesHash | ||
|
|
||
| // When content resources are stored in etcd, we need to keep track of the content resource so they | ||
| // are properly gargabe collected by the content controller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| // are properly gargabe collected by the content controller. | |
| // are properly garbage-collected by the content controller. |
| // SetupWithManager sets up the controller with the Manager. | ||
| func (r *ContentReconciler) SetupWithManager(mgr ctrl.Manager) error { | ||
| return ctrl.NewControllerManagedBy(mgr). | ||
| Named("content-controller"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs for this state:
By default, controllers are named using the lowercase version of their kind.
Except for a few (drift, gitrepo status and helmops status), we don't name our reconcilers. We may want to harmonise their names for metrics at some point.
In the meantime, for consistency, I'd be tempted to set:
| Named("content-controller"). | |
| Named("content"). // equivalent to the default, as per the docs, so we might as well skip this line |
or
| Named("content-controller"). | |
| Named("content-reconciler"). // same format as in the drift controller |
That being said, I'm not sure how useful metrics for this controller would be, so that's definitely not a major issue ;)
| ) (controllerutil.OperationResult, *fleet.BundleDeployment, error) { | ||
| logger := l.WithValues("deploymentID", bd.Spec.DeploymentID) | ||
|
|
||
| // When content resources are stored in etcd, we need to add finalizers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will want to update unit tests for this controller (e.g. here), to reflect the fact that content management is now owned by another controller.
| } | ||
| } | ||
|
|
||
| return ctrl.Result{}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be nice to (unit-)test this, just to be on the safe side. WDYT?
This PR introduces changes to how GitRepos are deleted, as well as their dependent resources.
Two main changes are introduced:
The function that deletes the GitRepo has been modified to check for any pending resources, especially Bundles and ImageScans, since previously the GitRepo was removed before these resources were fully deleted.
A new controller is introduced for Content resources to track the number of BundleDeployments that reference the same one. This controller aims to eliminate the conflicts that occurred when updating or deleting BundleDeployments, as they were being updated or removed concurrently while modifying a shared Content resource.
With these changes, the goal is to eliminate both update conflicts (which triggered additional Reconcile calls in the controllers) and potential race conditions when deleting and recreating GitRepos that still had dependent resources pending deletion.
Refers to #4192
Refers to #4251
Additional Information
Checklist
- [ ] I have updated the documentation via a pull request in thefleet-docs repository.