-
Notifications
You must be signed in to change notification settings - Fork 5
Description
In the course of chasing down #51, I noticed that sometimes changing the ceph-csi version with juju doesn't lead to the new version rolling out.
Concretely, we start with three nodes and three provisioner replicas on ceph-csi version 3.9.0.
$ juju config ceph-csi release
v3.9.0
and
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
csi-rbdplugin-bk78x 2/2 Running 0 5m15s
csi-rbdplugin-mgzlw 2/2 Running 0 5m17s
csi-rbdplugin-provisioner-78b57445cc-4hb5n 6/6 Running 0 5m
csi-rbdplugin-provisioner-78b57445cc-dkpql 6/6 Running 0 5m17s
csi-rbdplugin-provisioner-78b57445cc-zs6cn 6/6 Running 0 5m2s
csi-rbdplugin-tvf6m 2/2 Running 0 5m14s
Then we update $ juju config ceph-csi release=v3.13.0.
The csi-rbdplugin pods are updated, but the csi-rbdplugin-provisioner rollout gets stuck
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
csi-rbdplugin-7skx6 2/2 Running 0 22s
csi-rbdplugin-cbg27 2/2 Running 0 23s
csi-rbdplugin-kcqw4 2/2 Running 0 24s
csi-rbdplugin-provisioner-7468594bbd-mb2bp 0/6 Pending 0 24s
csi-rbdplugin-provisioner-78b57445cc-4hb5n 6/6 Running 0 8m55s
csi-rbdplugin-provisioner-78b57445cc-dkpql 6/6 Running 0 9m12s
csi-rbdplugin-provisioner-78b57445cc-zs6cn 6/6 Running 0 8m57s
$ kubectl get pod/csi-rbdplugin-provisioner-78b57445cc-4hb5n \
-o=jsonpath='{.status.containerStatuses[?(@.name=="csi-rbdplugin-controller")].image}'
rocks.canonical.com:443/cdk/cephcsi/cephcsi:v3.9.0%
The issue here is that the ceph-csi operator leaves maxUnavailable at its default value of 25% while at the same time setting a podAntiAffinity which allows only one csi-rbdplugin-provisioner per node.
Since every node already has a csi-rbdplugin-provisioner we can't spawn a new one before killing an old one. However, the maximum number of unavailable replicas is 0.25 * 3 < 1 so we aren't allowed to terminate any either. We're stuck.
The simplest fix is just to patch maxUnavailable to 1
$ kubectl patch deployment.apps/csi-rbdplugin-provisioner \
-p '{"spec":{"strategy":{"rollingUpdate":{"maxUnavailable":1}}}}'
deployment.apps/csi-rbdplugin-provisioner patched
in which case the deployment immediately rolls out the update
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
csi-rbdplugin-7skx6 2/2 Running 0 6m25s
csi-rbdplugin-cbg27 2/2 Running 0 6m26s
csi-rbdplugin-kcqw4 2/2 Running 0 6m27s
csi-rbdplugin-provisioner-7468594bbd-cnllt 6/6 Running 0 9s
csi-rbdplugin-provisioner-7468594bbd-jhpc7 6/6 Running 0 7s
csi-rbdplugin-provisioner-7468594bbd-mb2bp 6/6 Running 0 6m27s
$ kubectl get pod/csi-rbdplugin-provisioner-7468594bbd-mb2bp \
-o=jsonpath='{.status.containerStatuses[?(@.name=="csi-rbdplugin-controller")].image}'
rocks.canonical.com:443/cdk/cephcsi/cephcsi:v3.13.0
Since the default value of provisioner-replicas is three, and man small k8s deployments will have three nodes, is it worthwhile to add a check for this e.g. here , that checks for this scenario and sets maxUnavailble to 1 if detected? Or perhaps give the user a warning?