Occasional retrieving IMDS metadata failed on AL2023

/kind bug

We currently have AL2 nodes and have never had a problem with this.

When switching to AL2023 nodes, occasionally the ebs-csi-node will fail to retrieve metadata from IMDS. This only appears to happen at node startup time, if we restart the ebs-csi-node daemonset, it is able to retrieve metadata from IMDS reliably.

It does appear to successfully fallback to getting metadata from Kubernetes, but we think IMDS should not be failing like this.

**What happened?**
```
I1211 20:07:09.634316       1 main.go:157] "Initializing metadata"
E1211 20:07:14.635517       1 metadata.go:51] "Retrieving IMDS metadata failed, falling back to Kubernetes metadata" err="could not get EC2 instance identity metadata: operation error ec2imds: GetInstanceIdentityDocument, request canceled, context deadline exceeded"
I1211 20:07:14.645753       1 metadata.go:55] "Retrieved metadata from Kubernetes"
I1211 20:07:14.646110       1 driver.go:69] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.34.0"
I1211 20:07:16.167040       1 node.go:941] "CSINode Allocatable value is set" nodeName="ip-100-64-153-121.ec2.internal" count=31
```

**What you expected to happen?**
```
I1211 20:24:41.226237       1 main.go:157] "Initializing metadata"
I1211 20:24:42.479940       1 metadata.go:48] "Retrieved metadata from IMDS"
I1211 20:24:42.480783       1 driver.go:69] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.34.0"
I1211 20:24:43.497952       1 node.go:941] "CSINode Allocatable value is set" nodeName="ip-100-64-251-153.ec2.internal" count=31
```

**How to reproduce it (as minimally and precisely as possible)?**

**Anything else we need to know?**:

Our launch template looks like:
```
  NodeLaunchTemplate2023:
    Type: AWS::EC2::LaunchTemplate
    Condition: CreateManagedNodegroup2023
    DependsOn:
    - Cluster
    Properties:
      LaunchTemplateData:
        BlockDeviceMappings:
        - DeviceName: /dev/xvda
          Ebs:
            DeleteOnTermination: true
            Encrypted: true
            VolumeSize: !Ref WorkerVolumeSize
            VolumeType: gp3
        MetadataOptions:
          HttpEndpoint: enabled
          HttpPutResponseHopLimit: 2
          HttpTokens: required
          InstanceMetadataTags: disabled
        NetworkInterfaces:
        - DeviceIndex: 0
          Groups:
          - !GetAtt Cluster.ClusterSecurityGroupId
```

And our managed nodegroup looks like:
```
  ManagedNodegroup2023a:
    Type: AWS::EKS::Nodegroup
    Condition: CreateManagedNodegroup2023
    DependsOn:
    - Cluster
    - NodeInstanceRole
    - NodeLaunchTemplate2023
    Properties:
      AmiType: AL2023_x86_64_STANDARD
      CapacityType: ON_DEMAND
      ClusterName: !Ref Cluster
      InstanceTypes:
      - !Ref WorkerInstanceType
      LaunchTemplate:
        Id: !Ref NodeLaunchTemplate2023
        Version: !GetAtt NodeLaunchTemplate2023.LatestVersionNumber
      NodeRole: !GetAtt NodeInstanceRole.Arn
      ScalingConfig:
        DesiredSize: !Ref NodegroupSizeDesired
        MaxSize: !Ref NodegroupSizeMaximum
        MinSize: !Ref NodegroupSizeMinimum
      Subnets:
      - Fn::ImportValue:
          !Sub "${VpcName}-private-a"
      UpdateConfig:
        MaxUnavailable: 1
```
**Environment**
- Kubernetes version (use `kubectl version`): v1.30.6-eks-7f9249a
- Driver version: v1.34.0


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Occasional retrieving IMDS metadata failed on AL2023 #2262

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Occasional retrieving IMDS metadata failed on AL2023 #2262

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions