generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 117
Open
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.
Description
What happened:
When I deploy a LeaderWorkerSet (LWS) with only one replica, I want to leverage maxSurge to perform a zero-downtime rollout. However, our current implementation seems to not support this scenario.
What you expected to happen:
Rolling update is vital to online services with zero downtime.
How to reproduce it (as minimally and precisely as possible):
- Deploy LWS with maxSurge = 2 and replicas = 1:
apiVersion: leaderworkerset.x-k8s.io/v1
kind: LeaderWorkerSet
metadata:
name: leaderworkerset-rollout
spec:
rolloutStrategy:
type: RollingUpdate
rollingUpdateConfiguration:
maxUnavailable: 1
maxSurge: 2
replicas: 1
leaderWorkerTemplate:
size: 1
workerTemplate:
spec:
containers:
- name: nginx
image: nginxinc/nginx-unprivileged:1.27
resources:
limits:
cpu: "100m"
requests:
cpu: "50m"
ports:
- containerPort: 8080- Once the LWS is ready, redeploy with an invalid image:
apiVersion: leaderworkerset.x-k8s.io/v1
kind: LeaderWorkerSet
metadata:
name: leaderworkerset-rollout
spec:
rolloutStrategy:
type: RollingUpdate
rollingUpdateConfiguration:
maxUnavailable: 1
maxSurge: 2
replicas: 1
leaderWorkerTemplate:
size: 1
workerTemplate:
spec:
containers:
- name: nginx
image: nginxinc/nginx-unprivileged:1.27-not-exist
resources:
limits:
cpu: "100m"
requests:
cpu: "50m"
ports:
- containerPort: 8080- There only one Pod, and that Pod in ImagePullBackOff state.
k get po -l leaderworkerset.sigs.k8s.io/name=leaderworkerset-rollout
NAME READY STATUS RESTARTS AGE
leaderworkerset-rollout-0 0/1 ImagePullBackOff 0 48sAnything else we need to know?:
I suspect this issue is related to the following code snippet:
- In this section, the code prematurely scales down the expected maxSurge in order to gradually remove surge replicas. This causes finalReplicas to go back to lws.replica = 1.
// wantReplicas calculates the final replicas if needed.
wantReplicas := func(unreadyReplicas int32) int32 {
if unreadyReplicas <= int32(maxSurge) {
// When we have n unready replicas and n bursted replicas, we should
// start to release the burst replica gradually for the accommodation of
// the unready ones.
finalReplicas := lwsReplicas + utils.NonZeroValue(int32(unreadyReplicas)-1)
r.Record.Eventf(lws, corev1.EventTypeNormal, GroupsProgressing,
fmt.Sprintf("deleting surge replica %s-%d", lws.Name, finalReplicas))
return finalReplicas
}
return burstReplicas
}Environment:
- Kubernetes version (use
kubectl version): Client Version: v1.34.0
Kustomize Version: v5.7.1
Server Version: v1.34.1-aliyun.1 - LWS version (use
git describe --tags --dirty --always): v0.7.0 - Cloud provider or hardware configuration:
- OS (e.g:
cat /etc/os-release): - Kernel (e.g.
uname -a): - Install tools:
- Others:
Metadata
Metadata
Assignees
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.