Skip to content

bug: Work and WorkPlacement status conditions are ordering-dependent #740

@tgoodwin

Description

@tgoodwin

Background

I'm developing a tool that systematically explores controller reconciliation ordering, staleness, and fault injection (kamera) and I found some issues in Kratix.

Observed behavior:

The Work and WorkPlacement controllers produce ordering-dependent status conditions. Depending on which controller reconciles first, the status conditions on both objects can end up in different states:

  • If the WorkController reconciles first and sets up the Work status before the WorkPlacementController runs, both objects get their full set of status conditions (ScheduleSucceeded, WriteSucceeded, Ready, etc.).
  • If the WorkPlacementController reconciles first, the Work object may never get its status updated — missing finalizers or conditions — because the WorkPlacementController's writes don't trigger the WorkController to re-evaluate.
  • If the WorkPlacementController runs but the WorkController hasn't set ScheduleSucceeded yet, the WorkPlacement can end up with Ready=True set prematurely (before scheduling has actually completed).
  • Conversely, the WorkPlacement may never get its WriteSucceeded or Ready conditions set at all if the WorkController completes before the WorkPlacementController has a chance to write.

With two Work objects targeting the same Destination, these per-object status variations compound — producing even more divergent status condition outcomes.

Root cause:

The Work and WorkPlacement controllers' status update logic lacks sufficient idempotency guards. When one controller runs before the other has completed its initial setup, the second controller may skip status writes that were expected to happen.

Expected behavior:

Each controller should set its status conditions regardless of which controller ran first. The status update logic should be idempotent — running the same reconcile multiple times should converge to the same status.

Proposed Fix

The specific issue is in SetWorkplacementReadyStatus() at workplacement_types.go:110, called from workplacement_controller.go:234. This method checks whether ScheduleSucceeded is not False before setting Ready=True. But if the Scheduler hasn't run yet (because the WorkController hasn't reconciled), the ScheduleSucceeded condition doesn't exist at all — and a missing condition is not False, so the check passes. This means Ready=True gets set before the Work has actually been scheduled.

Similarly, WorkController.updateWorkStatus() at work_controller.go:164-203 reads WorkPlacement.WriteSucceeded. If the WorkPlacementController hasn't run yet, this condition is missing, so the Work controller doesn't set its status at all and returns early.

possible fix for both: treat a missing condition as "not yet determined" rather than "not failed." Specifically:

  1. setWorkplacementReady() should require ScheduleSucceeded to be explicitly True (not just not-False) before setting Ready=True.
  2. updateWorkStatus() should unconditionally set initial status conditions (e.g., ScheduleSucceeded=Unknown) on the Work object during its first reconcile, rather than depending on WorkPlacement state being present.

This ensures both controllers always write their status regardless of ordering, and subsequent reconciles converge to the correct final state.

I'm happy to put up a PR for this if it would be helpful.

Version tested: latest github.com/syntasso/kratix (k8s.io/client-go v0.34.1 / Kubernetes 1.34)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions