Skip to content

aws-eks: Updating KubernetesManifest deletes it instead #33406

Open
@esun74

Description

@esun74

Describe the bug

Updating a KubernetesManifest resource through CDK can actually cause it to get deleted.

During a resource replacement, if overwrite: true and the previous manifest has any overlap with the new manifest, the overlapping section would be lost. When the manifest is unchanged, the entire resource is deleted. Issue cannot be mitigated by a rollback or code revert and will repeat on any subsequent update.

Regression Issue

  • Select this option if this issue appears to be a regression.

Last Known Working CDK Version

No response

Expected Behavior

Replacing a KubernetesManifest should at most delete and re-create the underlying EKS manifest resources. A minimal update to a KubernetesManifest should not result in a loss of cluster functionality resulting from missing Kubernetes resources.

Current Behavior

Updates to a KubernetesManifest which are applied as a replacement cause cluster resources to be wiped. Rollbacks and reverts do not bring the cluster back to a healthy state.

Given Manifest A (previous) and Manifest B (new) are based on the same yaml, replacing the KubernetesManifest resource looks like this:

  1. Cloudformation first applies Manifest B, which overwrites Manifest A in EKS; nothing happens basically
  2. Manifest A and B both exist in the cloudformation stack, manifest contents are correctly configured in EKS
  3. Cloudformation deletes Manifest A, which deletes the manifest resources from EKS
  4. Cloudformation now has "updated" to Manifest B, but nothing is in EKS anymore

Reproduction Steps

Setup

new eks.KubernetesManifest(cluster, 'Sleeper', {
  manifest: [
    {
      apiVersion: 'v1',
      kind: 'Pod',
      metadata: {
        name: 'test-sleeper',
      },
      spec: {
        containers: [
          {
            name: 'sleeper',
            image: 'alpine:latest',
            imagePullPolicy: 'Always',
            command: ['/bin/sleep', 'infinity'],
          },
        ],
      },
    },
  ],
  cluster,
  overwrite: true,
});
> kubectl get pods

NAME           READY   STATUS    RESTARTS   AGE
test-sleeper   1/1     Running   0          40s

Minimal Change

- new eks.KubernetesManifest(cluster, 'Sleeper', {
+ new eks.KubernetesManifest(cluster, 'Sleeper1', {

CloudFormation Events

Timestamp Logical ID Status
2025-02-11 13:49:18 UTC-0800 ClusterSleeper0E1728F7 DELETE_COMPLETE
2025-02-11 13:48:38 UTC-0800 ClusterSleeper0E1728F7 DELETE_IN_PROGRESS
2025-02-11 13:48:37 UTC-0800 <stack> UPDATE_COMPLETE_CLEANUP_IN_PROGRESS
2025-02-11 13:48:17 UTC-0800 ClusterSleeper1A9127B4A CREATE_COMPLETE
2025-02-11 13:48:17 UTC-0800 ClusterSleeper1A9127B4A CREATE_IN_PROGRESS (Resource creation Initiated)
2025-02-11 13:48:05 UTC-0800 ClusterSleeper1A9127B4A CREATE_IN_PROGRESS
> kubectl get pods

No resources found in default namespace.

Reverts Are Ineffective

- new eks.KubernetesManifest(cluster, 'Sleeper1', {
+ new eks.KubernetesManifest(cluster, 'Sleeper', {

Similar events to above, sleeper pod is created then deleted again.

> kubectl get pods

No resources found in default namespace.

Possible Solution

Immediate Mitigating Options:

  • Trigger a minimal replacement and set the manifest's deletion policy to RETAIN
    (manifest.node.defaultChild as CfnResource).applyRemovalPolicy(RemovalPolicy.RETAIN);
  • Remove the manifest from CDK entirely, deploy, then add it back

Note: Using RemovalPolicy.RETAIN comes with the natural downside of having to clean up dangling resources manually

Additional Information/Context

Additional Risks:

If we update manifests and there is any overlap between the original and subsequent manifests, CloudFormation might invisibly delete parts of a manifest. For example, if manifest version 1.0 is deployed and replaced with manifest version 2.0, the intersecting resources (1.0 ∩ 2.0) will be deleted when cleaning up 1.0.

CDK CLI Version

2.160.0

Framework Version

No response

Node.js Version

18

OS

Amazon Linux 2 x86_64

Language

TypeScript

Language Version

5.0.4

Other information

Sev2: P199049085
Tracking: P200043360
Case ID 173931643600782

Metadata

Metadata

Assignees

No one assigned

    Labels

    @aws-cdk/aws-eksRelated to Amazon Elastic Kubernetes ServicebugThis issue is a bug.effort/mediumMedium work item – several days of effortp2

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions