Skip to content

bug: Deleting APIExport with active bindings produces many divergent final states — APIExport can survive deletion #3925

@tgoodwin

Description

@tgoodwin

Background

I'm developing a tool that systematically explores controller reconciliation ordering, staleness, and fault injection (kamera).

Describe the bug

I observe that deleting an APIExport with active bindings can produce several different outcomes depending on controller ordering:

  1. APIExport can survive deletion — the APIExport controller (apiexport_reconcile.go:44-157) has no finalizer or deletion-blocking mechanism, yet in some orderings a controller re-creates or prevents the deletion, leaving the APIExport present after the system settles.
  2. APIBinding ends up in different states — the annotation sync controller (apibindingannotation_controller.go:258-267) patches APIBinding annotations from the APIExport. If the export is already deleted when annotation sync runs, the binding retains stale annotations; if it runs before deletion, the annotations are refreshed. The APIBinding conditions also vary depending on whether the export was present when the binding reconciler processed it.
  3. LogicalCluster conditions divergeAPIBinderInitializerController (apibinder_initializer_controller.go:330-335) and DefaultAPIBindingLifecycleController (default_apibinding_lifecycle_controller.go:312-317) both commit LogicalCluster status via full-status merge patches. The last writer wins.
  4. APIExportEndpointSlice diverges — the URLs controller (apiexportendpointsliceurls_reconcile.go:64-71) early-returns if any condition is not True. If the primary controller (apiexportendpointslice_reconcile.go:66-90) hasn't updated conditions before the URLs controller runs, endpoint URLs are not populated.

Other objects (Workspace, WorkspaceType, Shard, Partition, consumer LogicalCluster) are consistent regardless of ordering.

Steps To Reproduce

  1. Set up a fully initialized workspace with consumer, provider, APIExport, Shard, Partition, EndpointSlice, and APIBinding
  2. Delete the APIExport
  3. Observe the final state varies:
    • In some cases, the APIExport is deleted and bindings enter an error state
    • In other cases, the APIExport survives deletion
    • APIBinding conditions and annotations vary

Expected Behaviour

Deleting an APIExport should produce a consistent final state. If the deletion should be blocked (because bindings exist), it should be blocked consistently. If the deletion should proceed, the binding and endpoint cleanup should be deterministic.

Proposed Fix

APIBinding already has a dedicated deletion finalizer controller (apibinding_deletion_controller.go), but APIExport has no equivalent — no finalizer, no admission webhook for Delete (admission.go:59 only handles Create/Update), and no cleanup orchestration. There's an unused APIBindingsByAPIExport index (indexers/apibinding.go:118-133) that could look up active bindings for a given export.

Adding a deletion finalizer controller for APIExport (mirroring the APIBinding pattern) would make this deterministic. On deletion, the finalizer controller would query active bindings via the existing index, release resource locks on LogicalCluster (there's a TODO acknowledging this gap at apibinding_reconcile.go:277-279), clean up the EndpointSlice, and only remove the finalizer once cleanup is complete.

Versions

  • kcp: v0.30.0 (commit 7952f476d)
  • Kubernetes: simulated via kamera (based on k8s.io/client-go v0.35.0 / Kubernetes 1.35)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions