Skip to content

CSI driver does not take into account Dynamically provisioned VPC CNIs for allocatable count calculation. #2249

Open
@prad9192

Description

@prad9192

/kind bug

What happened?

The allocatableCount reported by CSINode on EKS clusters doesn't accurately take into account the actual number of ENIs attached to the nodes for calculating allocatable.count. The calculation only considers statically attached ENIs present at node bootstrap and doesn't account for the ENIs dynamically allocated by the VPC CNI. This leads to a static allocatableCount that doesn't update as the VPC CNI attaches more ENIs to accommodate new workloads.

What you expected to happen?

The allocatableCount should dynamically take into account the actual number of ENIs attached to the node, including both static ENIs and those dynamically provisioned by the VPC CNI. This would provide an accurate representation allocation.count for CSINode.

 apiVersion: storage.k8s.io/v1
    kind: CSINode
    metadata:
      annotations:
        storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/azure-file,kubernetes.io/cinder,kubernetes.io/gce-pd,kubernetes.io/vsphere-volume
      creationTimestamp: "2024-10-21T09:02:47Z"
      name: xxxxx
      ownerReferences:
        - apiVersion: v1
          kind: Node
          name: xxxxx
          uid: 2da9c59d-1ac2-42bd-9e06-4ec01127153e
    spec:
      drivers:
        - allocatable:
            count: 25
          name: ebs.csi.aws.com
          nodeID: i-xxxx
          topologyKeys:
            - kubernetes.io/os
            - topology.ebs.csi.aws.com/zone
            - topology.kubernetes.io/zone

How to reproduce it (as minimally and precisely as possible)?

  1. Create an EKS cluster with nodes using an instance type (e.g., r6 instances).
  2. Observe the initial allocatableCount of ENIs reported on CSINode resource. This value will be based on the instance's maximum ENI limit minus the initial ENIs attached + EBS volumes at bootstrap.
  3. Deploy workloads that require the VPC CNI to attach additional ENIs to the nodes.
  4. Observe that the allocatableCount remains static even though the actual number of attached ENIs has increased.

Anything else we need to know?:

This issue can lead to inaccurate resource reporting, and difficulties in managing workloads. Leading to below errors on workloads.

  Warning  FailedAttachVolume  80s (x12 over 56m)  attachdetach-controller  (combined from similar events): AttachVolume.Attach failed for volume "pvc-0c2a501a-bb06-4c9b-95aa-4cda4fb6aac2" : rpc error: code = Internal desc = Could not attach volume "vol-07b6e18e94978a87f" to node "i-0240e6b849f452539": WaitForAttachmentState AttachVolume error, expected device but be attached but was attaching, volumeID="vol-07b6e18e94978a87f", instanceID="i-0240e6b849f452539", Device="/dev/xvdam", err=operation error EC2: AttachVolume, https response error StatusCode: 400, RequestID: 7edb421b-9dc2-4001-af1e-73d628fabfb5, api error VolumeInUse: vol-07b6e18e94978a87f is already attached to an instance

Environment

  • Kubernetes version (use kubectl version):
    • Client Version: v1.31.2
    • Kustomize Version: v5.4.2
    • Server Version: v1.29.10-eks-7f9249a
  • Driver version: Amazon EBS CSI Driver version: v1.34.0-eksbuild.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions