-
Notifications
You must be signed in to change notification settings - Fork 124
Description
Describe the bug
When you have a Kubernetes cluster with an application that mounts the host directory /var/lib in read-only mode, it leads to problems with the Mayastor remount process. The Mayastor CSI driver can't unmount the disk because the operating system sees that some processes still do not allow unmounting /var/lib/kubelet/plugins/kubernetes.io/csi/io.openebs.csi-mayastor/{volume_id}/globalmount, even if the globalmount looks like unmounted, because the directory is empty, but the ext4 journaling process still exists, and the NVMe device still exists.
If you will delete the /var/lib/kubelet/plugins/kubernetes.io/csi/io.openebs.csi-mayastor/{volume_id} directory manually with the rm -r command, after this, the ext4 journaling process will stop, and the CSI driver can continue unmounting.
To Reproduce
-
Install the Vector log aggregator application into the Kubernetes cluster via Helm chart:
( By default, Vector will mount/var/libfor some reason and perhaps will try to read the entry )
https://github.com/vectordotdev/helm-charts/blob/23f60fec2332b20a301796c80bf7c5b49b383045/charts/vector/values.yaml#L353 -
Try to create pods with volumes and try to restart them. For example, 30 pods with volumes for three times.
Some parts of the pods ( like 20% or more ) will be stuck in the pending phase, and you will see the errorproc entry still existsin the CSI driver logs.
Script to find empty directories on the node:
find /var/lib/kubelet/plugins/kubernetes.io/csi/io.openebs.csi-mayastor -type d -name "globalmount" -empty -exec dirname {} \;Script to find out which process is still trying to use those empty directories by NVMe device ID:
DEVICE='nvme0n1'
for pid in $(ls /proc | grep '^[0-9]\+$'); do
if [ -r /proc/$pid/mounts ]; then
if grep -q ${DEVICE} /proc/$pid/mounts; then
echo "Mounted in PID $pid:"
grep ${DEVICE} /proc/$pid/mounts
fi
fi
doneExpected behaviour
It looks like understandable behaviour, but it can be a bit tricky to investigate. + If we talk, for example, about the AWS CSI driver for EBS volumes, I can't reproduce this kind of behaviour on AWS. So I'm not sure, but maybe it is possible to fix it at the Mayastor code level to mitigate behaviour like this.
OS info:
- Distro: Ubuntu 24.04
- Kernel version: 6.14.0-29-generic
- MayaStor revision or container image: 2.9.2