Skip to content

aws-eks: spot interrupt handler for ASG capacity is incompatible with latest/support EKS versions #33108

Open
@shapirov103

Description

@shapirov103

Describe the bug

Cluster provisioning fails when spot interrupt handler is set to true and ASG capacity is used with the latest Kubernetes versions.

Created a simple cluster with EKS Blueprints for CDK and used ASG capacity provider.
CDK code is using cluster.addAutoScalingGroupCapacity with spotInterruptHandler set to true (default setting).

Getting the following exception:

Received response status [FAILED] from custom resource. Message returned: Error: b'Release "asgtestchartspotinterrupthandler88cd0a56" does not exist. Installing it now.\nError: unable to build kubernetes objects from release manifest: resource mapping not found for name: "asgtestchartspotinterrupthandler88cd0a56-aws-node-termination-h" namespace: "" from "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"\nensure CRDs are installed first\n'
Logs: /aws/lambda/asg-test-awscdkawseksKubectlProvid-Handler886CB40B-VjmYzuKObYxM
at invokeUserFunction (/var/task/framework.js:2:6)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async onEvent (/var/task/framework.js:1:369)
at async Runtime.handler (/var/task/cfn-response.js:1:1837) (RequestId: d7f295fe-9046-45b5-bbe3-fcc72f9cc84d)

Similar results when setting Kubernetes version to 1.29 and 1.30.

Narrowed down to this code: https://github.com/aws/aws-cdk/blob/main/packages/aws-cdk-lib/aws-eks/lib/cluster.ts#L1163-L1176

Why is helm chart version hardcoded?

Regression Issue

  • Select this option if this issue appears to be a regression.

Last Known Working CDK Version

No response

Expected Behavior

Cluster provisioned with ASG capacity and spot interrupt handler installed.

Current Behavior

CFN provisioning failed with the exception described in the body of the issue.

Reproduction Steps

Created a simple cluster with EKS Blueprints for CDK and used ASG capacity provider.
CDK code is using cluster.addAutoScalingGroupCapacity with spotInterruptHandler set to true (default setting).

Possible Solution

maintain a map of chart versions for node termination handler that are supported by the latest Kubernetes/EKS versions or allow customers to pass the version (less preferred).

Additional Information/Context

Potential workaround is to disable spot interrupt handler and install node termination helm chart with the correct helm chart version, e.g.

version: "0.25.1",
repository: 'oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler',

CDK CLI Version

2.173.4

Framework Version

No response

Node.js Version

20.10

OS

MacOS

Language

TypeScript

Language Version

No response

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    @aws-cdk/aws-eksRelated to Amazon Elastic Kubernetes ServicebugThis issue is a bug.effort/mediumMedium work item – several days of effortp1

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions