Skip to content

Redeployed pod can't mount a csi-rbd-sc PVC - rbd image is still being used #4375

@stibi

Description

@stibi

Describe the bug

We have a Proxmox cluster with Ceph and k8s running there.
We are using the csi-rbd provider to deploy PVCs.

It is all seems to be working, a pod is started, pvc is created, a rbd image seems to be created, all is mounted and the app inside a pod is able to use it. Until the pod is deleted and recreated. From that moment, it won't start, as the PVC is not ready for the pod:

Events:
  Type     Reason                  Age               From                     Message
  ----     ------                  ----              ----                     -------
  Normal   Scheduled               9m51s             default-scheduler        Successfully assigned monitoring/kube-prometheus-grafana-74bfb88f64-tctbf to k8s-dev-worker0
  Normal   SuccessfulAttachVolume  9m51s             attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-7e9588ad-19d1-4099-bd49-30fd2b9f0d42"
  Warning  FailedMount             17s (x9 over 9m)  kubelet                  MountVolume.MountDevice failed for volume "pvc-7e9588ad-19d1-4099-bd49-30fd2b9f0d42" : rpc error: code = Internal desc = rbd image k8s-dev/csi-vol-dea80a4e-85b6-46e2-9d08-ffba10ef7825 is still being used

I think I was able to verify that the volume is really not mounted anywhere. I got a shell inside each ceph-csi-rbd-nodeplugin pod, on each worker node, and from the csi-rbdplugin container I was looking for the rbd device with something like rbd device ls | grep csi-vol-dea80a4e-85b6-46e2-9d08-ffba10ef7825. I double check directly on the worker node OS, by looking at all mount entries. Nothing found.

I also tried to mount the device manually. I picked a ceph-csi-rbd-nodeplugin container, prepared ceph and keyring config and I was able to map and mount: rbd -c /tmp/ceph.conf -k /tmp/ceph.keyring --id k8s-dev --pool k8s-dev map csi-vol-dea80a4e-85b6-46e2-9d08-ffba10ef7825, and then mount the block device on the host with like: mount /dev/rbd7 /mnt/. That worked. Now the rbd image is actually being used :)

But I can't find out why k8s things it can't be mounted. I need some help here please.

Is there a change that the storage.kubernetes.io/csiProvisionerIdentity could play any role in all of this? Does it change when the ceph-csi-rbd-provisioner is redeployed?
Because I noted that the csiProvisionerIdentity is different among persistent volumes and I did a couple of redeploys of the provisioner.

Environment details

  • Image/version of Ceph CSI driver :
  • Helm chart version : 3.10.1
  • Kernel version : 5.15.0-87-generic
  • Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its
    krbd or rbd-nbd) :
  • Kubernetes cluster version : 1.28.2
  • Ceph cluster version : 17.2.6

Steps to reproduce

  • create a deployment, backend with a rbd pvc
  • all works
  • delete the pod
  • pod get stucked in Init phase because of FailedMount - MountVolume.MountDevice failed for volume "pvc-7e9588ad-19d1-4099-bd49-30fd2b9f0d42" : rpc error: code = Internal desc = rbd image k8s-dev/csi-vol-dea80a4e-85b6-46e2-9d08-ffba10ef7825 is still being used

Nothing interesting from logs unfortunately, or at least I have not noticed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions