Description
/kind bug
What happened?
The allocatableCount
reported by CSINode on EKS clusters doesn't accurately take into account the actual number of ENIs attached to the nodes for calculating allocatable.count. The calculation only considers statically attached ENIs present at node bootstrap and doesn't account for the ENIs dynamically allocated by the VPC CNI. This leads to a static allocatableCount
that doesn't update as the VPC CNI attaches more ENIs to accommodate new workloads.
What you expected to happen?
The allocatableCount
should dynamically take into account the actual number of ENIs attached to the node, including both static ENIs and those dynamically provisioned by the VPC CNI. This would provide an accurate representation allocation.count for CSINode.
apiVersion: storage.k8s.io/v1
kind: CSINode
metadata:
annotations:
storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/azure-file,kubernetes.io/cinder,kubernetes.io/gce-pd,kubernetes.io/vsphere-volume
creationTimestamp: "2024-10-21T09:02:47Z"
name: xxxxx
ownerReferences:
- apiVersion: v1
kind: Node
name: xxxxx
uid: 2da9c59d-1ac2-42bd-9e06-4ec01127153e
spec:
drivers:
- allocatable:
count: 25
name: ebs.csi.aws.com
nodeID: i-xxxx
topologyKeys:
- kubernetes.io/os
- topology.ebs.csi.aws.com/zone
- topology.kubernetes.io/zone
How to reproduce it (as minimally and precisely as possible)?
- Create an EKS cluster with nodes using an instance type (e.g., r6 instances).
- Observe the initial
allocatableCount
of ENIs reported on CSINode resource. This value will be based on the instance's maximum ENI limit minus the initial ENIs attached + EBS volumes at bootstrap. - Deploy workloads that require the VPC CNI to attach additional ENIs to the nodes.
- Observe that the
allocatableCount
remains static even though the actual number of attached ENIs has increased.
Anything else we need to know?:
This issue can lead to inaccurate resource reporting, and difficulties in managing workloads. Leading to below errors on workloads.
Warning FailedAttachVolume 80s (x12 over 56m) attachdetach-controller (combined from similar events): AttachVolume.Attach failed for volume "pvc-0c2a501a-bb06-4c9b-95aa-4cda4fb6aac2" : rpc error: code = Internal desc = Could not attach volume "vol-07b6e18e94978a87f" to node "i-0240e6b849f452539": WaitForAttachmentState AttachVolume error, expected device but be attached but was attaching, volumeID="vol-07b6e18e94978a87f", instanceID="i-0240e6b849f452539", Device="/dev/xvdam", err=operation error EC2: AttachVolume, https response error StatusCode: 400, RequestID: 7edb421b-9dc2-4001-af1e-73d628fabfb5, api error VolumeInUse: vol-07b6e18e94978a87f is already attached to an instance
Environment
- Kubernetes version (use
kubectl version
):- Client Version: v1.31.2
- Kustomize Version: v5.4.2
- Server Version: v1.29.10-eks-7f9249a
- Driver version: Amazon EBS CSI Driver version: v1.34.0-eksbuild.1