-
Notifications
You must be signed in to change notification settings - Fork 438
Description
Background
I'm developing a tool that systematically explores controller reconciliation ordering, staleness, and fault injection (kamera).
Describe the bug
I observe that deleting an APIExport with active bindings can produce several different outcomes depending on controller ordering:
- APIExport can survive deletion — the APIExport controller (
apiexport_reconcile.go:44-157) has no finalizer or deletion-blocking mechanism, yet in some orderings a controller re-creates or prevents the deletion, leaving the APIExport present after the system settles. - APIBinding ends up in different states — the annotation sync controller (
apibindingannotation_controller.go:258-267) patches APIBinding annotations from the APIExport. If the export is already deleted when annotation sync runs, the binding retains stale annotations; if it runs before deletion, the annotations are refreshed. The APIBinding conditions also vary depending on whether the export was present when the binding reconciler processed it. - LogicalCluster conditions diverge —
APIBinderInitializerController(apibinder_initializer_controller.go:330-335) andDefaultAPIBindingLifecycleController(default_apibinding_lifecycle_controller.go:312-317) both commit LogicalCluster status via full-status merge patches. The last writer wins. - APIExportEndpointSlice diverges — the URLs controller (
apiexportendpointsliceurls_reconcile.go:64-71) early-returns if any condition is not True. If the primary controller (apiexportendpointslice_reconcile.go:66-90) hasn't updated conditions before the URLs controller runs, endpoint URLs are not populated.
Other objects (Workspace, WorkspaceType, Shard, Partition, consumer LogicalCluster) are consistent regardless of ordering.
Steps To Reproduce
- Set up a fully initialized workspace with consumer, provider, APIExport, Shard, Partition, EndpointSlice, and APIBinding
- Delete the APIExport
- Observe the final state varies:
- In some cases, the APIExport is deleted and bindings enter an error state
- In other cases, the APIExport survives deletion
- APIBinding conditions and annotations vary
Expected Behaviour
Deleting an APIExport should produce a consistent final state. If the deletion should be blocked (because bindings exist), it should be blocked consistently. If the deletion should proceed, the binding and endpoint cleanup should be deterministic.
Proposed Fix
APIBinding already has a dedicated deletion finalizer controller (apibinding_deletion_controller.go), but APIExport has no equivalent — no finalizer, no admission webhook for Delete (admission.go:59 only handles Create/Update), and no cleanup orchestration. There's an unused APIBindingsByAPIExport index (indexers/apibinding.go:118-133) that could look up active bindings for a given export.
Adding a deletion finalizer controller for APIExport (mirroring the APIBinding pattern) would make this deterministic. On deletion, the finalizer controller would query active bindings via the existing index, release resource locks on LogicalCluster (there's a TODO acknowledging this gap at apibinding_reconcile.go:277-279), clean up the EndpointSlice, and only remove the finalizer once cleanup is complete.
Versions
- kcp: v0.30.0 (commit
7952f476d) - Kubernetes: simulated via kamera (based on k8s.io/client-go v0.35.0 / Kubernetes 1.35)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status