kubernetes-sigs · jakobmoellerdev · Oct 31, 2025 · ellistarn · Nov 13, 2025 · jakobmoellerdev
diff --git a/docs/design/proposals/rgd-ownership-and-deletion.md b/docs/design/proposals/rgd-ownership-and-deletion.md
@@ -0,0 +1,203 @@
+# ResourceGraphDefinition Ownership and Deletion Protection
+
+## Problem statement
+
+Previously, **CustomResourceDefinitions (CRDs)** generated by **ResourceGraphDefinitions (RGDs)** were only linked to their parent by a label (`metadata.labels["kro.run/resourcegraphdefinition"]`).
+There was **no ownership relation** (`ownerReferences`) or **controller reference**, leading to several issues:
+
+* The KRO controller required explicit logic to decide when to delete CRDs, controlled by a Helm flag `allowCRDDeletion`.
+* When this flag is set to `false` (the default), deleting an RGD **did not delete** its generated CRD, leaving orphaned definitions.
+* Users could still manually delete RGDs without realizing that this orphaned affected CRDs and controller behavior, causing potential **inconsistent API states**.
+
+The lack of Kubernetes-native ownership semantics made RGD–CRD lifecycle management unreliable and opaque, relying on our custom management.
+
+## Proposal
+
+Introduce proper **Kubernetes ownership semantics** between RGDs and their generated CRDs.
+Combine this with a **safe-deletion mechanism** for user-initiated RGD removals and **clearer internal deletion control** for KRO.
+
+### Overview
+
+This proposal introduces three coordinated changes:
+
+1. **Ownership introduction** — CRDs now include an `OwnerReference` pointing to the creating RGD.
+2. **Deletion protection** — A `ValidatingAdmissionPolicy` prevents accidental user deletion of RGDs.
+3. **Controller deletion control** — The Helm flag `allowCRDDeletion` now cleanly governs whether KRO is permitted to delete CRDs as part of reconciliation.
-3. **Controller deletion control** — The Helm flag `allowCRDDeletion` now cleanly governs whether KRO is permitted to delete CRDs as part of reconciliation.
+1. **Ownership introduction** — CRDs now include an `OwnerReference` pointing to the creating RGD.
+2. **Deletion protection** — A `ValidatingAdmissionPolicy` prevents accidental user deletion of RGDs (blocks deletion of RGDs with instances)
+3. **Controller deletion control** — The Helm flag `allowCRDDeletion` now cleanly governs whether KRO is permitted to delete CRDs as part of reconciliation.
-3. **Controller deletion control** — The Helm flag `allowCRDDeletion` now cleanly governs whether KRO is permitted to delete CRDs as part of reconciliation.
+1. **Ownership introduction** — CRDs now include an `OwnerReference` pointing to the creating RGD.
+2. **Deletion protection** — A `ValidatingAdmissionPolicy` prevents accidental user deletion of RGDs (blocks deletion of RGDs with instances)
+3. **Controller deletion control** — The Helm flag `allowCRDDeletion` now cleanly governs whether KRO is permitted to delete CRDs as part of reconciliation.
+
+Together, these changes align KRO with Kubernetes-native resource management and improve data safety.
+
+---
+
+### Design details
+
+#### 1. Ownership introduction
+
+Each CRD created by an RGD would carry an **OwnerReference** set by the controller:
+
+```go
+if err := ctrl.SetControllerReference(rgd, crd, r.Scheme()); err != nil {
+    mark.KindUnready(err.Error())
+    return nil, nil, fmt.Errorf("failed to set controller reference of CRD: %w", err)
+}
+```
+
+This produces CRDs with metadata similar to:
+
+```yaml
+metadata:
+  name: sample.kro.run
+  ownerReferences:
+    - apiVersion: kro.run/v1alpha1
+      kind: ResourceGraphDefinition
+      name: sample
+      uid: 12345-abcde
+      controller: true
+      blockOwnerDeletion: true
+```
+
+**Resulting behavior:**
+
+* When an RGD is deleted, its owned CRD is automatically garbage-collected. This is proper behavior
+  as we can still delete RGDs with policy "orphan" to preserve CRDs.
+* Manual CRD deletion (while RGD still exists) will trigger re-reconciliation, as ownership implies controller responsibility.
+  (we can and should think about warning or failing user deletion requests on CRDs managed by KRO)
+* The original RGD label remains for traceability, but lifecycle is now managed by native Kubernetes GC.
+
+Integration tests confirm the presence and correctness of `ownerReferences` (including `controller` and `blockOwnerDeletion` flags).
+
+#### 2. Deletion protection for RGDs
+
+Because RGD deletion now cascades to delete its CRD, an **admission policy** prevents accidental data loss.
+
+When `config.allowCRDDeletion` is `false` (default), Helm installs a **ValidatingAdmissionPolicy** and **Binding**:
+
+```yaml
+apiVersion: admissionregistration.k8s.io/v1
+kind: ValidatingAdmissionPolicy
+metadata:
+  name: {{ include "kro.fullname" . }}-crd-protection-policy
+spec:
+  matchConstraints:
+    resourceRules:
+      - apiGroups: ["kro.run"]
+        apiVersions: ["v1alpha1"]
+        operations: ["DELETE"]
+        resources: ["resourcegraphdefinitions"]
+  validations:
+    - expression: |
+        has(oldObject.metadata.annotations) &&
+        oldObject.metadata.annotations['{{ .Values.validation.admission.policy.rgd.annotation.key }}'] == 'true'
+      reason: Invalid
+      message: |
+        Deletion denied. To proceed, set annotation '{{ .Values.validation.admission.policy.rgd.annotation.key }}: "true"'.
+        Removing an RGD also deletes its CustomResourceDefinition and may cause data loss.
+```
+
+Users must explicitly annotate an RGD to confirm deletion:
+
+```yaml
+metadata:
+  annotations:
+    kro.run/allow-delete: "true"
+```
+
+This ensures intentional deletions only.
+
+#### 3. Controller-side deletion control (`allowCRDDeletion`)
+
+Previously, the flag `allowCRDDeletion` determined whether KRO would delete CRDs it had created if the RGD was removed or changed.
+Now, with `OwnerReferences` managing lifecycle automatically, the flag’s meaning is refined:
+
+* When `allowCRDDeletion: false`:
+  KRO still sets owner references, but **does not itself delete CRDs**.
+  Deletion is handled by Kubernetes GC only when the RGD is removed.
+  The ValidatingAdmissionPolicy remains active to prevent accidental RGD deletion.
+
+* When `allowCRDDeletion: true`:
+  KRO may directly delete CRDs it manages (e.g., during reconciliation or cleanup).
+  The Helm chart skips installation of the ValidatingAdmissionPolicy, allowing full manual control.
+
+Updated values:
+
+```yaml
+config:
+  allowCRDDeletion: false
+
+validation:
+  admission:
+    policy:
+      rgd:
+        annotation:
+          key: kro.run/allow-delete
+        actions: '[Deny]'
+```
+
+#### 4. Documentation and tests
+
+**Docs:**
+A new section “Deletion of ResourceGraphDefinitions” explains:
+
+* Ownership semantics between RGDs and CRDs
+* The new cascading deletion behavior
+* The admission-based deletion protection
+* Helm configuration for disabling or customizing behavior
+
+**Integration tests:**
+
+* Assert CRDs are created with correct owner references
+* Confirm garbage collection after RGD deletion
+* Verify protection policy denies unannotated deletions
+* Ensure disabling the policy (via Helm) removes admission resources
+
+---
+
+## Other solutions considered
+
+| Option                               | Reason Rejected                                                                         |
+| ------------------------------------ |-----------------------------------------------------------------------------------------|
+| Continue using label linkage only    | No lifecycle tracking or automatic cleanup, against k8s API semantics                   |
+| Finalizer-based blocking             | Overcomplicates reconciliation; redundant with OwnerReferences and foreground finalizer |
+| Webhook validation                   | Equivalent logic but adds latency and operational overhead                              |
+| Hard-coded controller deletion rules | Less flexible than declarative GC + admission policy                                    |
+
+---
+
+## Scoping
+
+### In scope
+
+* Add `OwnerReference` and `Controller` relationship between RGD and CRD
+* Introduce ValidatingAdmissionPolicy for RGD deletions
+* Retain `allowCRDDeletion` flag for controller-side cleanup behavior
+* Documentation and Helm value updates
+
+### Not in scope
+
+* Migration logic for preexisting orphaned CRDs
+* Protection for other dependent resources (we only want to look at CRDs)
+* Multi-owner CRD sharing or advanced garbage-collection rules
+
+## Testing strategy
+
+### Requirements
+
+* Kubernetes 1.30+ with `ValidatingAdmissionPolicy` enabled
+* Kind or cluster-based integration test environment
+
+### Test plan
+
+1. Verify CRDs include correct `OwnerReferences` after reconciliation
+2. Deleting an RGD triggers automatic CRD deletion via GC
+3. Deletion without annotation is rejected with clear error message
+4. Deletion with `kro.run/allow-delete: "true"` proceeds successfully
+5. Setting `allowCRDDeletion: true` disables admission policy and allows controller-driven and manual cleanup
+
+---
+
+## Discussion and notes
+
+This change brings RGD–CRD management in line with native Kubernetes semantics:
+
+* Ownership ensures consistent lifecycle and prevents orphaned CRDs.
+* Admission control ensures user awareness and intentional deletion.
+* The `allowCRDDeletion` flag cleanly separates controller behavior from user intent.