Redeployed pod can't mount a csi-rbd-sc PVC - rbd image is still being used

# Describe the bug #

We have a Proxmox cluster with Ceph and k8s running there.
We are using the csi-rbd provider to deploy PVCs.

It is all seems to be working, a pod is started, pvc is created, a rbd image seems to be created, all is mounted and the app inside a pod is able to use it. Until the pod is deleted and recreated. From that moment, it won't start, as the PVC is not ready for the pod:

```
Events:
  Type     Reason                  Age               From                     Message
  ----     ------                  ----              ----                     -------
  Normal   Scheduled               9m51s             default-scheduler        Successfully assigned monitoring/kube-prometheus-grafana-74bfb88f64-tctbf to k8s-dev-worker0
  Normal   SuccessfulAttachVolume  9m51s             attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-7e9588ad-19d1-4099-bd49-30fd2b9f0d42"
  Warning  FailedMount             17s (x9 over 9m)  kubelet                  MountVolume.MountDevice failed for volume "pvc-7e9588ad-19d1-4099-bd49-30fd2b9f0d42" : rpc error: code = Internal desc = rbd image k8s-dev/csi-vol-dea80a4e-85b6-46e2-9d08-ffba10ef7825 is still being used
```

I think I was able to verify that the volume is really not mounted anywhere. I got a shell inside each `ceph-csi-rbd-nodeplugin` pod, on each worker node, and from the `csi-rbdplugin` container I was looking for the rbd device with something like `rbd device ls | grep csi-vol-dea80a4e-85b6-46e2-9d08-ffba10ef7825`. I double check directly on the worker node OS, by looking at all mount entries. Nothing found.

I also tried to mount the device manually. I picked a `ceph-csi-rbd-nodeplugin` container, prepared ceph and keyring config and I was able to map and mount: `rbd -c /tmp/ceph.conf -k /tmp/ceph.keyring --id k8s-dev --pool k8s-dev map csi-vol-dea80a4e-85b6-46e2-9d08-ffba10ef7825`, and then mount the block device on the host with like: `mount /dev/rbd7 /mnt/`. That worked. Now the rbd image is actually being used :)

But I can't find out why k8s things it can't be mounted. I need some help here please.

Is there a change that the `storage.kubernetes.io/csiProvisionerIdentity` could play any role in all of this? Does it change when the `ceph-csi-rbd-provisioner` is redeployed?
Because I noted that the `csiProvisionerIdentity` is different among persistent volumes and I did a couple of redeploys of the provisioner.

# Environment details #

- Image/version of Ceph CSI driver : 
- Helm chart version : 3.10.1
- Kernel version : 5.15.0-87-generic
- Mounter used for mounting PVC (for cephFS its `fuse` or `kernel`. for rbd its
  `krbd` or `rbd-nbd`) :
- Kubernetes cluster version : 1.28.2
- Ceph cluster version : 17.2.6

# Steps to reproduce #

- create a deployment, backend with a rbd pvc
- all works
- delete the pod
- pod get stucked in `Init` phase because of `FailedMount` - `MountVolume.MountDevice failed for volume "pvc-7e9588ad-19d1-4099-bd49-30fd2b9f0d42" : rpc error: code = Internal desc = rbd image k8s-dev/csi-vol-dea80a4e-85b6-46e2-9d08-ffba10ef7825 is still being used`

Nothing interesting from logs unfortunately, or at least I have not noticed.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redeployed pod can't mount a csi-rbd-sc PVC - rbd image is still being used #4375

Describe the bug

Environment details

Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Redeployed pod can't mount a csi-rbd-sc PVC - rbd image is still being used #4375

Description

Describe the bug

Environment details

Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions