Describe the bug
We have a Proxmox cluster with Ceph and k8s running there.
We are using the csi-rbd provider to deploy PVCs.
It is all seems to be working, a pod is started, pvc is created, a rbd image seems to be created, all is mounted and the app inside a pod is able to use it. Until the pod is deleted and recreated. From that moment, it won't start, as the PVC is not ready for the pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m51s default-scheduler Successfully assigned monitoring/kube-prometheus-grafana-74bfb88f64-tctbf to k8s-dev-worker0
Normal SuccessfulAttachVolume 9m51s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-7e9588ad-19d1-4099-bd49-30fd2b9f0d42"
Warning FailedMount 17s (x9 over 9m) kubelet MountVolume.MountDevice failed for volume "pvc-7e9588ad-19d1-4099-bd49-30fd2b9f0d42" : rpc error: code = Internal desc = rbd image k8s-dev/csi-vol-dea80a4e-85b6-46e2-9d08-ffba10ef7825 is still being used
I think I was able to verify that the volume is really not mounted anywhere. I got a shell inside each ceph-csi-rbd-nodeplugin pod, on each worker node, and from the csi-rbdplugin container I was looking for the rbd device with something like rbd device ls | grep csi-vol-dea80a4e-85b6-46e2-9d08-ffba10ef7825. I double check directly on the worker node OS, by looking at all mount entries. Nothing found.
I also tried to mount the device manually. I picked a ceph-csi-rbd-nodeplugin container, prepared ceph and keyring config and I was able to map and mount: rbd -c /tmp/ceph.conf -k /tmp/ceph.keyring --id k8s-dev --pool k8s-dev map csi-vol-dea80a4e-85b6-46e2-9d08-ffba10ef7825, and then mount the block device on the host with like: mount /dev/rbd7 /mnt/. That worked. Now the rbd image is actually being used :)
But I can't find out why k8s things it can't be mounted. I need some help here please.
Is there a change that the storage.kubernetes.io/csiProvisionerIdentity could play any role in all of this? Does it change when the ceph-csi-rbd-provisioner is redeployed?
Because I noted that the csiProvisionerIdentity is different among persistent volumes and I did a couple of redeploys of the provisioner.
Environment details
- Image/version of Ceph CSI driver :
- Helm chart version : 3.10.1
- Kernel version : 5.15.0-87-generic
- Mounter used for mounting PVC (for cephFS its
fuse or kernel. for rbd its
krbd or rbd-nbd) :
- Kubernetes cluster version : 1.28.2
- Ceph cluster version : 17.2.6
Steps to reproduce
- create a deployment, backend with a rbd pvc
- all works
- delete the pod
- pod get stucked in
Init phase because of FailedMount - MountVolume.MountDevice failed for volume "pvc-7e9588ad-19d1-4099-bd49-30fd2b9f0d42" : rpc error: code = Internal desc = rbd image k8s-dev/csi-vol-dea80a4e-85b6-46e2-9d08-ffba10ef7825 is still being used
Nothing interesting from logs unfortunately, or at least I have not noticed.
Describe the bug
We have a Proxmox cluster with Ceph and k8s running there.
We are using the csi-rbd provider to deploy PVCs.
It is all seems to be working, a pod is started, pvc is created, a rbd image seems to be created, all is mounted and the app inside a pod is able to use it. Until the pod is deleted and recreated. From that moment, it won't start, as the PVC is not ready for the pod:
I think I was able to verify that the volume is really not mounted anywhere. I got a shell inside each
ceph-csi-rbd-nodepluginpod, on each worker node, and from thecsi-rbdplugincontainer I was looking for the rbd device with something likerbd device ls | grep csi-vol-dea80a4e-85b6-46e2-9d08-ffba10ef7825. I double check directly on the worker node OS, by looking at all mount entries. Nothing found.I also tried to mount the device manually. I picked a
ceph-csi-rbd-nodeplugincontainer, prepared ceph and keyring config and I was able to map and mount:rbd -c /tmp/ceph.conf -k /tmp/ceph.keyring --id k8s-dev --pool k8s-dev map csi-vol-dea80a4e-85b6-46e2-9d08-ffba10ef7825, and then mount the block device on the host with like:mount /dev/rbd7 /mnt/. That worked. Now the rbd image is actually being used :)But I can't find out why k8s things it can't be mounted. I need some help here please.
Is there a change that the
storage.kubernetes.io/csiProvisionerIdentitycould play any role in all of this? Does it change when theceph-csi-rbd-provisioneris redeployed?Because I noted that the
csiProvisionerIdentityis different among persistent volumes and I did a couple of redeploys of the provisioner.Environment details
fuseorkernel. for rbd itskrbdorrbd-nbd) :Steps to reproduce
Initphase because ofFailedMount-MountVolume.MountDevice failed for volume "pvc-7e9588ad-19d1-4099-bd49-30fd2b9f0d42" : rpc error: code = Internal desc = rbd image k8s-dev/csi-vol-dea80a4e-85b6-46e2-9d08-ffba10ef7825 is still being usedNothing interesting from logs unfortunately, or at least I have not noticed.