Description
/kind bug
On Snowball Edge EC2's, after a cluster has been up for a number of hours or days and pods have been moved around, the ebs-csi driver cannot create new pvcs on the node. We get messages like below. Workaround is to cordon/drain node and reboot. The block devices are getting created on the EC2's, but they exist as a different device than what the driver is looking for, e.g. in the below example, /dev/vdi was created on the EC2 and was the appropriate size.
Warning FailedMount 15s (x7 over 47s) kubelet MountVolume.MountDevice failed for volume "pvc-dcb8d5a7-2c7b-4d56-9ad5-38af06a8c38e" : rpc error: code = Internal desc = could not format "/dev/vdg" and mount it at "/var/lib/kubelet/plugins/kubernetes.io/csi/ebs.csi.aws.com/78bad98f12b3707d05490eafe2f1515e9e0c5eb0fa49800d184f11afd6899b93/globalmount": format of disk "/dev/vdg" failed: type:("ext4") target:("/var/lib/kubelet/plugins/kubernetes.io/csi/ebs.csi.aws.com/78bad98f12b3707d05490eafe2f1515e9e0c5eb0fa49800d184f11afd6899b93/globalmount") options:("defaults") errcode:(exit status 1) output:(mke2fs 1.46.5 (30-Dec-2021)
The file /dev/vdg does not exist and no size was specified.
Expected: driver configures storage properly
Additional info:
This is the EC2 instance metadata that is available.
https://docs.aws.amazon.com/snowball/latest/developer-guide/edge-compute-instance-metadata.html
There is no block device info available here. Looking at the driver readme, it looks like that may be necessary, and using kubernetes metadata does not provide this either. This may be life on Snowball Edges - writing this to see which tree is the right one to bark up. Also to document for other Snowball Edge users experiencing similar issues
Environment
- Kubernetes version (use
kubectl version
): v1.31.3 - Driver version: 2.36.0