Bug Report
When a canary release is cancelled because the workload is rolled back directly (image reverted to the stable revision), the controller drains the BatchRelease and finalises traffic routing correctly, but it does not clear the canary sub-status. The rollout ends up in phase: Healthy with Progressing: False / Completed, yet canaryStatus still holds the step-machine fields from the moment the rollback was triggered (currentStepIndex, currentStepState: StepPaused, podTemplateHash of the failed revision, canaryReplicas, canaryReadyReplicas, …).
Two visible symptoms:
- The rollout looks "stuck at step N paused" forever in tooling that gates on
canaryStatus.currentStepState.
kubectl kruise rollout approve is a no-op — the dispatch in reconcileRollout only runs reconcileRolloutProgressing while phase == Progressing, so patching NextStepIndex has no effect.
Reproduce
- Canary rollout with at least 2 steps, e.g. step 1 = 1 replica + traffic split, step 2 = 100%.
- Deploy a bad version → batch 1 brings up 1 canary pod that fails its health check (or any external bake mechanism).
- Revert the workload's pod template to the stable revision (rollback-directly path).
- Wait for the canary RS to scale to 0 and the deployment to settle on the stable RS.
- Inspect:
kubectl get rollout.rollouts.kruise.io <name> -o yaml.
Observed:
status:
phase: Healthy
canaryStatus:
currentStepIndex: 1
currentStepState: StepPaused
podTemplateHash: <hash-of-failed-revision>
canaryReplicas: 1
canaryReadyReplicas: 1
conditions:
- type: Progressing
status: "False"
reason: Completed
message: Rollout progressing has been cancelled
kubectl kruise rollout approve rollout.rollouts.kruise.io/<name> does nothing.
Root cause
reconcileRolloutProgressing (pkg/controller/rollout/rollout_progressing.go) handles the cancelling case by calling doFinalising and, when it returns done, transitioning the Progressing condition to Completed and the Succeeded condition to False. It never clears the sub-status.
case v1alpha1.ProgressingReasonCancelling:
rolloutContext.FinalizeReason = v1beta1.FinaliseReasonRollback
done, err = r.doFinalising(rolloutContext)
if err != nil {
return nil, err
} else if done {
progressingStateTransition(newStatus, corev1.ConditionFalse, v1alpha1.ProgressingReasonCompleted, "Rollout progressing has been canceled")
setRolloutSucceededCondition(newStatus, corev1.ConditionFalse)
}
Compare to handleContinuousRelease, which calls c.NewStatus.Clear() after the same kind of reset — that path leaves a clean status, the rollback path does not.
Expected
After cancellation finalising completes, canaryStatus/blueGreenStatus should be cleared (same treatment as continuous release), so subsequent inspection and downstream tooling see the rollout as idle, not "step paused".
Fix
Call newStatus.Clear() in the cancelling → completed branch. PR coming.
Environment
- openkruise/rollouts master (v0.6.2 / commit b2600e9), Kubernetes 1.x, Deployment workload.
Bug Report
When a canary release is cancelled because the workload is rolled back directly (image reverted to the stable revision), the controller drains the
BatchReleaseand finalises traffic routing correctly, but it does not clear the canary sub-status. The rollout ends up inphase: HealthywithProgressing: False / Completed, yetcanaryStatusstill holds the step-machine fields from the moment the rollback was triggered (currentStepIndex,currentStepState: StepPaused,podTemplateHashof the failed revision,canaryReplicas,canaryReadyReplicas, …).Two visible symptoms:
canaryStatus.currentStepState.kubectl kruise rollout approveis a no-op — the dispatch inreconcileRolloutonly runsreconcileRolloutProgressingwhilephase == Progressing, so patchingNextStepIndexhas no effect.Reproduce
kubectl get rollout.rollouts.kruise.io <name> -o yaml.Observed:
kubectl kruise rollout approve rollout.rollouts.kruise.io/<name>does nothing.Root cause
reconcileRolloutProgressing(pkg/controller/rollout/rollout_progressing.go) handles the cancelling case by callingdoFinalisingand, when it returnsdone, transitioning theProgressingcondition toCompletedand theSucceededcondition toFalse. It never clears the sub-status.Compare to
handleContinuousRelease, which callsc.NewStatus.Clear()after the same kind of reset — that path leaves a clean status, the rollback path does not.Expected
After cancellation finalising completes,
canaryStatus/blueGreenStatusshould be cleared (same treatment as continuous release), so subsequent inspection and downstream tooling see the rollout as idle, not "step paused".Fix
Call
newStatus.Clear()in the cancelling → completed branch. PR coming.Environment