Skip to content

Rolling update of provisioner stalls when provisioner-replaces is number of nodes and < 4 #60

@raineszm

Description

@raineszm

In the course of chasing down #51, I noticed that sometimes changing the ceph-csi version with juju doesn't lead to the new version rolling out.

Concretely, we start with three nodes and three provisioner replicas on ceph-csi version 3.9.0.

$ juju config ceph-csi release
v3.9.0

and

$ kubectl get pods                                                                                                       
NAME                                         READY   STATUS    RESTARTS   AGE
csi-rbdplugin-bk78x                          2/2     Running   0          5m15s
csi-rbdplugin-mgzlw                          2/2     Running   0          5m17s
csi-rbdplugin-provisioner-78b57445cc-4hb5n   6/6     Running   0          5m
csi-rbdplugin-provisioner-78b57445cc-dkpql   6/6     Running   0          5m17s
csi-rbdplugin-provisioner-78b57445cc-zs6cn   6/6     Running   0          5m2s
csi-rbdplugin-tvf6m                          2/2     Running   0          5m14s

Then we update $ juju config ceph-csi release=v3.13.0.
The csi-rbdplugin pods are updated, but the csi-rbdplugin-provisioner rollout gets stuck

$ kubectl get pods                                                                                                   
NAME                                         READY   STATUS    RESTARTS   AGE
csi-rbdplugin-7skx6                          2/2     Running   0          22s
csi-rbdplugin-cbg27                          2/2     Running   0          23s
csi-rbdplugin-kcqw4                          2/2     Running   0          24s
csi-rbdplugin-provisioner-7468594bbd-mb2bp   0/6     Pending   0          24s
csi-rbdplugin-provisioner-78b57445cc-4hb5n   6/6     Running   0          8m55s
csi-rbdplugin-provisioner-78b57445cc-dkpql   6/6     Running   0          9m12s
csi-rbdplugin-provisioner-78b57445cc-zs6cn   6/6     Running   0          8m57s

$ kubectl get pod/csi-rbdplugin-provisioner-78b57445cc-4hb5n \
   -o=jsonpath='{.status.containerStatuses[?(@.name=="csi-rbdplugin-controller")].image}'
rocks.canonical.com:443/cdk/cephcsi/cephcsi:v3.9.0%

The issue here is that the ceph-csi operator leaves maxUnavailable at its default value of 25% while at the same time setting a podAntiAffinity which allows only one csi-rbdplugin-provisioner per node.

Since every node already has a csi-rbdplugin-provisioner we can't spawn a new one before killing an old one. However, the maximum number of unavailable replicas is 0.25 * 3 < 1 so we aren't allowed to terminate any either. We're stuck.

The simplest fix is just to patch maxUnavailable to 1

$ kubectl patch deployment.apps/csi-rbdplugin-provisioner \
   -p '{"spec":{"strategy":{"rollingUpdate":{"maxUnavailable":1}}}}'  
deployment.apps/csi-rbdplugin-provisioner patched

in which case the deployment immediately rolls out the update

$ kubectl get pods                                                                                                        
NAME                                         READY   STATUS    RESTARTS   AGE
csi-rbdplugin-7skx6                          2/2     Running   0          6m25s
csi-rbdplugin-cbg27                          2/2     Running   0          6m26s
csi-rbdplugin-kcqw4                          2/2     Running   0          6m27s
csi-rbdplugin-provisioner-7468594bbd-cnllt   6/6     Running   0          9s
csi-rbdplugin-provisioner-7468594bbd-jhpc7   6/6     Running   0          7s
csi-rbdplugin-provisioner-7468594bbd-mb2bp   6/6     Running   0          6m27s

$ kubectl get pod/csi-rbdplugin-provisioner-7468594bbd-mb2bp \
   -o=jsonpath='{.status.containerStatuses[?(@.name=="csi-rbdplugin-controller")].image}'
rocks.canonical.com:443/cdk/cephcsi/cephcsi:v3.13.0

Since the default value of provisioner-replicas is three, and man small k8s deployments will have three nodes, is it worthwhile to add a check for this e.g. here , that checks for this scenario and sets maxUnavailble to 1 if detected? Or perhaps give the user a warning?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions