Background
I'm developing a tool that systematically explores controller reconciliation ordering, staleness, and fault injection (kamera) and I found some issues in Kratix.
Observed behavior:
The Work and WorkPlacement controllers produce ordering-dependent status conditions. Depending on which controller reconciles first, the status conditions on both objects can end up in different states:
- If the
WorkController reconciles first and sets up the Work status before the WorkPlacementController runs, both objects get their full set of status conditions (ScheduleSucceeded, WriteSucceeded, Ready, etc.).
- If the
WorkPlacementController reconciles first, the Work object may never get its status updated — missing finalizers or conditions — because the WorkPlacementController's writes don't trigger the WorkController to re-evaluate.
- If the
WorkPlacementController runs but the WorkController hasn't set ScheduleSucceeded yet, the WorkPlacement can end up with Ready=True set prematurely (before scheduling has actually completed).
- Conversely, the WorkPlacement may never get its
WriteSucceeded or Ready conditions set at all if the WorkController completes before the WorkPlacementController has a chance to write.
With two Work objects targeting the same Destination, these per-object status variations compound — producing even more divergent status condition outcomes.
Root cause:
The Work and WorkPlacement controllers' status update logic lacks sufficient idempotency guards. When one controller runs before the other has completed its initial setup, the second controller may skip status writes that were expected to happen.
Expected behavior:
Each controller should set its status conditions regardless of which controller ran first. The status update logic should be idempotent — running the same reconcile multiple times should converge to the same status.
Proposed Fix
The specific issue is in SetWorkplacementReadyStatus() at workplacement_types.go:110, called from workplacement_controller.go:234. This method checks whether ScheduleSucceeded is not False before setting Ready=True. But if the Scheduler hasn't run yet (because the WorkController hasn't reconciled), the ScheduleSucceeded condition doesn't exist at all — and a missing condition is not False, so the check passes. This means Ready=True gets set before the Work has actually been scheduled.
Similarly, WorkController.updateWorkStatus() at work_controller.go:164-203 reads WorkPlacement.WriteSucceeded. If the WorkPlacementController hasn't run yet, this condition is missing, so the Work controller doesn't set its status at all and returns early.
possible fix for both: treat a missing condition as "not yet determined" rather than "not failed." Specifically:
setWorkplacementReady() should require ScheduleSucceeded to be explicitly True (not just not-False) before setting Ready=True.
updateWorkStatus() should unconditionally set initial status conditions (e.g., ScheduleSucceeded=Unknown) on the Work object during its first reconcile, rather than depending on WorkPlacement state being present.
This ensures both controllers always write their status regardless of ordering, and subsequent reconciles converge to the correct final state.
I'm happy to put up a PR for this if it would be helpful.
Version tested: latest github.com/syntasso/kratix (k8s.io/client-go v0.34.1 / Kubernetes 1.34)
Background
I'm developing a tool that systematically explores controller reconciliation ordering, staleness, and fault injection (kamera) and I found some issues in Kratix.
Observed behavior:
The Work and WorkPlacement controllers produce ordering-dependent status conditions. Depending on which controller reconciles first, the status conditions on both objects can end up in different states:
WorkControllerreconciles first and sets up the Work status before theWorkPlacementControllerruns, both objects get their full set of status conditions (ScheduleSucceeded,WriteSucceeded,Ready, etc.).WorkPlacementControllerreconciles first, the Work object may never get its status updated — missing finalizers or conditions — because the WorkPlacementController's writes don't trigger the WorkController to re-evaluate.WorkPlacementControllerruns but theWorkControllerhasn't setScheduleSucceededyet, the WorkPlacement can end up withReady=Trueset prematurely (before scheduling has actually completed).WriteSucceededorReadyconditions set at all if the WorkController completes before the WorkPlacementController has a chance to write.With two Work objects targeting the same Destination, these per-object status variations compound — producing even more divergent status condition outcomes.
Root cause:
The Work and WorkPlacement controllers' status update logic lacks sufficient idempotency guards. When one controller runs before the other has completed its initial setup, the second controller may skip status writes that were expected to happen.
Expected behavior:
Each controller should set its status conditions regardless of which controller ran first. The status update logic should be idempotent — running the same reconcile multiple times should converge to the same status.
Proposed Fix
The specific issue is in
SetWorkplacementReadyStatus()atworkplacement_types.go:110, called fromworkplacement_controller.go:234. This method checks whetherScheduleSucceededis not False before settingReady=True. But if the Scheduler hasn't run yet (because the WorkController hasn't reconciled), theScheduleSucceededcondition doesn't exist at all — and a missing condition is not False, so the check passes. This meansReady=Truegets set before the Work has actually been scheduled.Similarly,
WorkController.updateWorkStatus()atwork_controller.go:164-203readsWorkPlacement.WriteSucceeded. If the WorkPlacementController hasn't run yet, this condition is missing, so the Work controller doesn't set its status at all and returns early.possible fix for both: treat a missing condition as "not yet determined" rather than "not failed." Specifically:
setWorkplacementReady()should requireScheduleSucceededto be explicitly True (not just not-False) before settingReady=True.updateWorkStatus()should unconditionally set initial status conditions (e.g.,ScheduleSucceeded=Unknown) on the Work object during its first reconcile, rather than depending on WorkPlacement state being present.This ensures both controllers always write their status regardless of ordering, and subsequent reconciles converge to the correct final state.
I'm happy to put up a PR for this if it would be helpful.
Version tested: latest
github.com/syntasso/kratix(k8s.io/client-go v0.34.1 / Kubernetes 1.34)