bug: APIBinding reconciler crash after first write produces ordering-dependent LogicalCluster recovery state

**Background**

I'm developing a tool that systematically explores controller reconciliation ordering, staleness, and fault injection ([kamera](https://github.com/tgoodwin/kamera)).

**Describe the bug**

I observe that when the APIBinding reconciler crashes after its first write effect, the LogicalCluster's conditions diverge during recovery.

The APIBinding reconciler's write sequence per reconcile:
1. [`apibinding_reconcile.go:310`](https://github.com/kcp-dev/kcp/blob/main/pkg/reconciler/apis/apibinding/apibinding_reconcile.go#L310) — `updateLogicalCluster()`: writes resource locks to LogicalCluster
2. [`apibinding_reconcile.go:417-445`](https://github.com/kcp-dev/kcp/blob/main/pkg/reconciler/apis/apibinding/apibinding_reconcile.go#L417-L445) — CRD creation in `system:bound-crds`
3. [`apibinding_reconcile.go:593-595`](https://github.com/kcp-dev/kcp/blob/main/pkg/reconciler/apis/apibinding/apibinding_reconcile.go#L593-L595) — sets `InitialBindingCompleted=True`, `BindingUpToDate=True`, `Phase=Bound` (in-memory)
4. [`apibinding_controller.go:497`](https://github.com/kcp-dev/kcp/blob/main/pkg/reconciler/apis/apibinding/apibinding_controller.go#L497) — `commit()`: patches APIBinding status to API server

A crash after write 1 (LogicalCluster update) but before write 4 (APIBinding status commit) leaves the APIBinding in an intermediate state where `InitialBindingCompleted` is not set. The LogicalClusterController then writes different conditions depending on what state it observes at recovery time — the intermediate APIBinding state causes different downstream condition evaluations depending on how far other controllers have progressed before the LogicalClusterController reconciles.

The resulting LogicalCluster conditions diverge because the LogicalClusterController reads different intermediate states depending on recovery ordering.

**Expected Behaviour**

After a crash and recovery, the LogicalCluster should converge to the same state regardless of which controller reconciles first.

**Proposed Fix**

`APIBinderInitializerController` and `DefaultAPIBindingLifecycleController` both write to LogicalCluster.Status concurrently, and the KCP committer ([`committer.go:129`](https://github.com/kcp-dev/kcp/blob/main/pkg/reconciler/committer/committer.go#L129)) patches the *entire* status object. When both controllers read LogicalCluster, modify different conditions, and commit — the second commit's merge patch overwrites the first controller's condition changes because it includes the full status as that controller saw it (read-modify-write race).

I think the fix would be to use server-side apply (SSA) with a unique field manager per controller, rather than merge patch via the committer. With SSA, `APIBinderInitializerController` (field manager `apibinder-initializer`) would own `WorkspaceAPIBindingsInitialized` and `Status.Initializers`, while `DefaultAPIBindingLifecycleController` (field manager `default-apibinding-lifecycle`) would own `WorkspaceAPIBindingsReconciled`. Concurrent applies would not conflict because each controller only owns its specific fields.

**Additional Context**

The divergent LogicalCluster conditions persist as stable end states — the system converges to one of two distinct final states depending on recovery ordering, with no further reconciliation correcting the difference.

**Versions**

- kcp: v0.30.0 (commit `7952f476d`)
- Kubernetes: simulated via [kamera](https://github.com/tgoodwin/kamera) (based on k8s.io/client-go v0.35.0 / Kubernetes 1.35)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: APIBinding reconciler crash after first write produces ordering-dependent LogicalCluster recovery state #3926

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: APIBinding reconciler crash after first write produces ordering-dependent LogicalCluster recovery state #3926

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions