-
Notifications
You must be signed in to change notification settings - Fork 274
KREP-004: OwnerReferences and deletion policy for ResourceGraphDefinitions #763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,203 @@ | ||||||||||
| # ResourceGraphDefinition Ownership and Deletion Protection | ||||||||||
|
|
||||||||||
| ## Problem statement | ||||||||||
|
|
||||||||||
| Previously, **CustomResourceDefinitions (CRDs)** generated by **ResourceGraphDefinitions (RGDs)** were only linked to their parent by a label (`metadata.labels["kro.run/resourcegraphdefinition"]`). | ||||||||||
| There was **no ownership relation** (`ownerReferences`) or **controller reference**, leading to several issues: | ||||||||||
|
|
||||||||||
| * The KRO controller required explicit logic to decide when to delete CRDs, controlled by a Helm flag `allowCRDDeletion`. | ||||||||||
| * When this flag is set to `false` (the default), deleting an RGD **did not delete** its generated CRD, leaving orphaned definitions. | ||||||||||
| * Users could still manually delete RGDs without realizing that this orphaned affected CRDs and controller behavior, causing potential **inconsistent API states**. | ||||||||||
|
|
||||||||||
| The lack of Kubernetes-native ownership semantics made RGD–CRD lifecycle management unreliable and opaque, relying on our custom management. | ||||||||||
|
|
||||||||||
| ## Proposal | ||||||||||
|
|
||||||||||
| Introduce proper **Kubernetes ownership semantics** between RGDs and their generated CRDs. | ||||||||||
| Combine this with a **safe-deletion mechanism** for user-initiated RGD removals and **clearer internal deletion control** for KRO. | ||||||||||
|
|
||||||||||
| ### Overview | ||||||||||
|
|
||||||||||
| This proposal introduces three coordinated changes: | ||||||||||
|
|
||||||||||
| 1. **Ownership introduction** — CRDs now include an `OwnerReference` pointing to the creating RGD. | ||||||||||
| 2. **Deletion protection** — A `ValidatingAdmissionPolicy` prevents accidental user deletion of RGDs. | ||||||||||
| 3. **Controller deletion control** — The Helm flag `allowCRDDeletion` now cleanly governs whether KRO is permitted to delete CRDs as part of reconciliation. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this mean the current implementation does not fully honor the flag? What about making the default to There's also a discussion to have about finalizers that plays a different role IMO (the best place is probably below in the discarded section)
Suggested change
|
||||||||||
|
|
||||||||||
| Together, these changes align KRO with Kubernetes-native resource management and improve data safety. | ||||||||||
|
|
||||||||||
| --- | ||||||||||
|
|
||||||||||
| ### Design details | ||||||||||
|
|
||||||||||
| #### 1. Ownership introduction | ||||||||||
|
|
||||||||||
| Each CRD created by an RGD would carry an **OwnerReference** set by the controller: | ||||||||||
|
|
||||||||||
| ```go | ||||||||||
| if err := ctrl.SetControllerReference(rgd, crd, r.Scheme()); err != nil { | ||||||||||
| mark.KindUnready(err.Error()) | ||||||||||
| return nil, nil, fmt.Errorf("failed to set controller reference of CRD: %w", err) | ||||||||||
| } | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| This produces CRDs with metadata similar to: | ||||||||||
|
|
||||||||||
| ```yaml | ||||||||||
| metadata: | ||||||||||
| name: sample.kro.run | ||||||||||
| ownerReferences: | ||||||||||
| - apiVersion: kro.run/v1alpha1 | ||||||||||
| kind: ResourceGraphDefinition | ||||||||||
| name: sample | ||||||||||
| uid: 12345-abcde | ||||||||||
| controller: true | ||||||||||
| blockOwnerDeletion: true | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| **Resulting behavior:** | ||||||||||
|
|
||||||||||
| * When an RGD is deleted, its owned CRD is automatically garbage-collected. This is proper behavior | ||||||||||
| as we can still delete RGDs with policy "orphan" to preserve CRDs. | ||||||||||
| * Manual CRD deletion (while RGD still exists) will trigger re-reconciliation, as ownership implies controller responsibility. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💯 |
||||||||||
| (we can and should think about warning or failing user deletion requests on CRDs managed by KRO) | ||||||||||
| * The original RGD label remains for traceability, but lifecycle is now managed by native Kubernetes GC. | ||||||||||
|
|
||||||||||
| Integration tests confirm the presence and correctness of `ownerReferences` (including `controller` and `blockOwnerDeletion` flags). | ||||||||||
|
|
||||||||||
| #### 2. Deletion protection for RGDs | ||||||||||
|
|
||||||||||
| Because RGD deletion now cascades to delete its CRD, an **admission policy** prevents accidental data loss. | ||||||||||
|
|
||||||||||
| When `config.allowCRDDeletion` is `false` (default), Helm installs a **ValidatingAdmissionPolicy** and **Binding**: | ||||||||||
|
|
||||||||||
| ```yaml | ||||||||||
| apiVersion: admissionregistration.k8s.io/v1 | ||||||||||
| kind: ValidatingAdmissionPolicy | ||||||||||
| metadata: | ||||||||||
| name: {{ include "kro.fullname" . }}-crd-protection-policy | ||||||||||
| spec: | ||||||||||
| matchConstraints: | ||||||||||
| resourceRules: | ||||||||||
| - apiGroups: ["kro.run"] | ||||||||||
| apiVersions: ["v1alpha1"] | ||||||||||
| operations: ["DELETE"] | ||||||||||
| resources: ["resourcegraphdefinitions"] | ||||||||||
| validations: | ||||||||||
| - expression: | | ||||||||||
| has(oldObject.metadata.annotations) && | ||||||||||
| oldObject.metadata.annotations['{{ .Values.validation.admission.policy.rgd.annotation.key }}'] == 'true' | ||||||||||
| reason: Invalid | ||||||||||
| message: | | ||||||||||
| Deletion denied. To proceed, set annotation '{{ .Values.validation.admission.policy.rgd.annotation.key }}: "true"'. | ||||||||||
| Removing an RGD also deletes its CustomResourceDefinition and may cause data loss. | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| Users must explicitly annotate an RGD to confirm deletion: | ||||||||||
|
|
||||||||||
| ```yaml | ||||||||||
| metadata: | ||||||||||
| annotations: | ||||||||||
| kro.run/allow-delete: "true" | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| This ensures intentional deletions only. | ||||||||||
|
|
||||||||||
| #### 3. Controller-side deletion control (`allowCRDDeletion`) | ||||||||||
|
|
||||||||||
| Previously, the flag `allowCRDDeletion` determined whether KRO would delete CRDs it had created if the RGD was removed or changed. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm curious to explore the use cases for setting this flag one way or another. If we're certain that all of the instances managed by the RGD are gone, is it really dangerous to remove the Kind? I'm not seeing the upsides of leaking it if we're confident we're not losing data. I can see downsides of not removing the kind, such as attempting to do a fresh installation of a new version where the Kind has changed substantially. |
||||||||||
| Now, with `OwnerReferences` managing lifecycle automatically, the flag’s meaning is refined: | ||||||||||
|
|
||||||||||
| * When `allowCRDDeletion: false`: | ||||||||||
| KRO still sets owner references, but **does not itself delete CRDs**. | ||||||||||
| Deletion is handled by Kubernetes GC only when the RGD is removed. | ||||||||||
| The ValidatingAdmissionPolicy remains active to prevent accidental RGD deletion. | ||||||||||
|
|
||||||||||
| * When `allowCRDDeletion: true`: | ||||||||||
| KRO may directly delete CRDs it manages (e.g., during reconciliation or cleanup). | ||||||||||
| The Helm chart skips installation of the ValidatingAdmissionPolicy, allowing full manual control. | ||||||||||
|
Comment on lines
+111
to
+118
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm unsure I make sense of the semantics. There's also a case that could lead to inconsistent behaviour, when the RGD The proposed behaviour and lifecycle sounds more like |
||||||||||
|
|
||||||||||
| Updated values: | ||||||||||
|
|
||||||||||
| ```yaml | ||||||||||
| config: | ||||||||||
| allowCRDDeletion: false | ||||||||||
|
|
||||||||||
| validation: | ||||||||||
| admission: | ||||||||||
| policy: | ||||||||||
| rgd: | ||||||||||
| annotation: | ||||||||||
| key: kro.run/allow-delete | ||||||||||
| actions: '[Deny]' | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| #### 4. Documentation and tests | ||||||||||
|
|
||||||||||
| **Docs:** | ||||||||||
| A new section “Deletion of ResourceGraphDefinitions” explains: | ||||||||||
|
|
||||||||||
| * Ownership semantics between RGDs and CRDs | ||||||||||
| * The new cascading deletion behavior | ||||||||||
| * The admission-based deletion protection | ||||||||||
| * Helm configuration for disabling or customizing behavior | ||||||||||
|
|
||||||||||
| **Integration tests:** | ||||||||||
|
|
||||||||||
| * Assert CRDs are created with correct owner references | ||||||||||
| * Confirm garbage collection after RGD deletion | ||||||||||
| * Verify protection policy denies unannotated deletions | ||||||||||
| * Ensure disabling the policy (via Helm) removes admission resources | ||||||||||
|
|
||||||||||
| --- | ||||||||||
|
|
||||||||||
| ## Other solutions considered | ||||||||||
|
|
||||||||||
| | Option | Reason Rejected | | ||||||||||
| | ------------------------------------ |-----------------------------------------------------------------------------------------| | ||||||||||
| | Continue using label linkage only | No lifecycle tracking or automatic cleanup, against k8s API semantics | | ||||||||||
| | Finalizer-based blocking | Overcomplicates reconciliation; redundant with OwnerReferences and foreground finalizer | | ||||||||||
jakobmoellerdev marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
| | Webhook validation | Equivalent logic but adds latency and operational overhead | | ||||||||||
| | Hard-coded controller deletion rules | Less flexible than declarative GC + admission policy | | ||||||||||
|
|
||||||||||
| --- | ||||||||||
|
|
||||||||||
| ## Scoping | ||||||||||
|
|
||||||||||
| ### In scope | ||||||||||
|
|
||||||||||
| * Add `OwnerReference` and `Controller` relationship between RGD and CRD | ||||||||||
| * Introduce ValidatingAdmissionPolicy for RGD deletions | ||||||||||
| * Retain `allowCRDDeletion` flag for controller-side cleanup behavior | ||||||||||
| * Documentation and Helm value updates | ||||||||||
|
|
||||||||||
| ### Not in scope | ||||||||||
|
|
||||||||||
| * Migration logic for preexisting orphaned CRDs | ||||||||||
| * Protection for other dependent resources (we only want to look at CRDs) | ||||||||||
| * Multi-owner CRD sharing or advanced garbage-collection rules | ||||||||||
|
|
||||||||||
| ## Testing strategy | ||||||||||
|
|
||||||||||
| ### Requirements | ||||||||||
|
|
||||||||||
| * Kubernetes 1.30+ with `ValidatingAdmissionPolicy` enabled | ||||||||||
| * Kind or cluster-based integration test environment | ||||||||||
|
|
||||||||||
| ### Test plan | ||||||||||
|
|
||||||||||
| 1. Verify CRDs include correct `OwnerReferences` after reconciliation | ||||||||||
| 2. Deleting an RGD triggers automatic CRD deletion via GC | ||||||||||
| 3. Deletion without annotation is rejected with clear error message | ||||||||||
| 4. Deletion with `kro.run/allow-delete: "true"` proceeds successfully | ||||||||||
| 5. Setting `allowCRDDeletion: true` disables admission policy and allows controller-driven and manual cleanup | ||||||||||
|
|
||||||||||
| --- | ||||||||||
|
|
||||||||||
| ## Discussion and notes | ||||||||||
|
|
||||||||||
| This change brings RGD–CRD management in line with native Kubernetes semantics: | ||||||||||
|
|
||||||||||
| * Ownership ensures consistent lifecycle and prevents orphaned CRDs. | ||||||||||
| * Admission control ensures user awareness and intentional deletion. | ||||||||||
| * The `allowCRDDeletion` flag cleanly separates controller behavior from user intent. | ||||||||||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can understand protecting against the deletion of the Kinds managed by RGDs, but what's the thinking behind stopping a user that says
k delete rgd foofrom deleting the rgd foo. Delete means delete, right?It strikes me that this could be something that organizations choose to opt in more broadly (deletion protection for any k8s resource), but odd that KRO would take the opinion for users when other projects don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main reason is that by default kubernetes deletes with background propagation. if we delete a RGD, that would automatically start deletion of the CRD as well during the delete. I only added this after there was an opinion against simply allowing a delete. I personally favor this too but adjusted after community call feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, k8s standard leans towards no protection and delete everything, cascade.
Moreover, once marked for deletion, there is no way back, you must complete deletion before being able to re-use it again (
kubectl deletestarts by setting the deletionTimestamp that can't be undone)As there is a clear 1-1 ownership link RGD <-> CRD I would indeed favour the standard behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the kind of thing where you can easily find voices in both directions. You'll see this debate surface through many AWS APIs, too. Just try deleting an S3 bucket with objects in it.
Given the controversy and the fact that it's broader than just our API, I'd just stick with the existing standard and let a separate solution handle deletion protection holistically.