- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5
Description
In the course of chasing down #51, I noticed that sometimes changing the ceph-csi version with juju doesn't lead to the new version rolling out.
Concretely, we start with three nodes and three provisioner replicas on ceph-csi version 3.9.0.
$ juju config ceph-csi release
v3.9.0
and
$ kubectl get pods                                                                                                       
NAME                                         READY   STATUS    RESTARTS   AGE
csi-rbdplugin-bk78x                          2/2     Running   0          5m15s
csi-rbdplugin-mgzlw                          2/2     Running   0          5m17s
csi-rbdplugin-provisioner-78b57445cc-4hb5n   6/6     Running   0          5m
csi-rbdplugin-provisioner-78b57445cc-dkpql   6/6     Running   0          5m17s
csi-rbdplugin-provisioner-78b57445cc-zs6cn   6/6     Running   0          5m2s
csi-rbdplugin-tvf6m                          2/2     Running   0          5m14s
Then we update $ juju config ceph-csi release=v3.13.0.
The csi-rbdplugin pods are updated, but the csi-rbdplugin-provisioner rollout gets stuck
$ kubectl get pods                                                                                                   
NAME                                         READY   STATUS    RESTARTS   AGE
csi-rbdplugin-7skx6                          2/2     Running   0          22s
csi-rbdplugin-cbg27                          2/2     Running   0          23s
csi-rbdplugin-kcqw4                          2/2     Running   0          24s
csi-rbdplugin-provisioner-7468594bbd-mb2bp   0/6     Pending   0          24s
csi-rbdplugin-provisioner-78b57445cc-4hb5n   6/6     Running   0          8m55s
csi-rbdplugin-provisioner-78b57445cc-dkpql   6/6     Running   0          9m12s
csi-rbdplugin-provisioner-78b57445cc-zs6cn   6/6     Running   0          8m57s
$ kubectl get pod/csi-rbdplugin-provisioner-78b57445cc-4hb5n \
   -o=jsonpath='{.status.containerStatuses[?(@.name=="csi-rbdplugin-controller")].image}'
rocks.canonical.com:443/cdk/cephcsi/cephcsi:v3.9.0%
The issue here is that the ceph-csi operator leaves maxUnavailable at its default value of 25% while at the same time setting a podAntiAffinity which allows only one csi-rbdplugin-provisioner per node.
Since every node already has a csi-rbdplugin-provisioner we can't spawn a new one before killing an old one. However, the maximum number of unavailable replicas is 0.25 * 3 < 1 so we aren't allowed to terminate any either. We're stuck.
The simplest fix is just to patch maxUnavailable to 1
$ kubectl patch deployment.apps/csi-rbdplugin-provisioner \
   -p '{"spec":{"strategy":{"rollingUpdate":{"maxUnavailable":1}}}}'  
deployment.apps/csi-rbdplugin-provisioner patched
in which case the deployment immediately rolls out the update
$ kubectl get pods                                                                                                        
NAME                                         READY   STATUS    RESTARTS   AGE
csi-rbdplugin-7skx6                          2/2     Running   0          6m25s
csi-rbdplugin-cbg27                          2/2     Running   0          6m26s
csi-rbdplugin-kcqw4                          2/2     Running   0          6m27s
csi-rbdplugin-provisioner-7468594bbd-cnllt   6/6     Running   0          9s
csi-rbdplugin-provisioner-7468594bbd-jhpc7   6/6     Running   0          7s
csi-rbdplugin-provisioner-7468594bbd-mb2bp   6/6     Running   0          6m27s
$ kubectl get pod/csi-rbdplugin-provisioner-7468594bbd-mb2bp \
   -o=jsonpath='{.status.containerStatuses[?(@.name=="csi-rbdplugin-controller")].image}'
rocks.canonical.com:443/cdk/cephcsi/cephcsi:v3.13.0
Since the default value of provisioner-replicas is three, and man small k8s deployments will have three nodes, is it worthwhile to add a check for this e.g. here , that checks for this scenario and sets maxUnavailble to 1 if detected? Or perhaps give the user a warning?