Description
Describe the bug
Updating a KubernetesManifest resource through CDK can actually cause it to get deleted.
During a resource replacement, if overwrite: true
and the previous manifest has any overlap with the new manifest, the overlapping section would be lost. When the manifest is unchanged, the entire resource is deleted. Issue cannot be mitigated by a rollback or code revert and will repeat on any subsequent update.
Regression Issue
- Select this option if this issue appears to be a regression.
Last Known Working CDK Version
No response
Expected Behavior
Replacing a KubernetesManifest should at most delete and re-create the underlying EKS manifest resources. A minimal update to a KubernetesManifest should not result in a loss of cluster functionality resulting from missing Kubernetes resources.
Current Behavior
Updates to a KubernetesManifest which are applied as a replacement cause cluster resources to be wiped. Rollbacks and reverts do not bring the cluster back to a healthy state.
Given Manifest A (previous) and Manifest B (new) are based on the same yaml, replacing the KubernetesManifest resource looks like this:
- Cloudformation first applies Manifest B, which overwrites Manifest A in EKS; nothing happens basically
- Manifest A and B both exist in the cloudformation stack, manifest contents are correctly configured in EKS
- Cloudformation deletes Manifest A, which deletes the manifest resources from EKS
- Cloudformation now has "updated" to Manifest B, but nothing is in EKS anymore
Reproduction Steps
Setup
new eks.KubernetesManifest(cluster, 'Sleeper', {
manifest: [
{
apiVersion: 'v1',
kind: 'Pod',
metadata: {
name: 'test-sleeper',
},
spec: {
containers: [
{
name: 'sleeper',
image: 'alpine:latest',
imagePullPolicy: 'Always',
command: ['/bin/sleep', 'infinity'],
},
],
},
},
],
cluster,
overwrite: true,
});
> kubectl get pods
NAME READY STATUS RESTARTS AGE
test-sleeper 1/1 Running 0 40s
Minimal Change
- new eks.KubernetesManifest(cluster, 'Sleeper', {
+ new eks.KubernetesManifest(cluster, 'Sleeper1', {
CloudFormation Events
Timestamp | Logical ID | Status |
---|---|---|
2025-02-11 13:49:18 UTC-0800 | ClusterSleeper0E1728F7 | DELETE_COMPLETE |
2025-02-11 13:48:38 UTC-0800 | ClusterSleeper0E1728F7 | DELETE_IN_PROGRESS |
2025-02-11 13:48:37 UTC-0800 | <stack> |
UPDATE_COMPLETE_CLEANUP_IN_PROGRESS |
2025-02-11 13:48:17 UTC-0800 | ClusterSleeper1A9127B4A | CREATE_COMPLETE |
2025-02-11 13:48:17 UTC-0800 | ClusterSleeper1A9127B4A | CREATE_IN_PROGRESS (Resource creation Initiated) |
2025-02-11 13:48:05 UTC-0800 | ClusterSleeper1A9127B4A | CREATE_IN_PROGRESS |
> kubectl get pods
No resources found in default namespace.
Reverts Are Ineffective
- new eks.KubernetesManifest(cluster, 'Sleeper1', {
+ new eks.KubernetesManifest(cluster, 'Sleeper', {
Similar events to above, sleeper pod is created then deleted again.
> kubectl get pods
No resources found in default namespace.
Possible Solution
Immediate Mitigating Options:
- Trigger a minimal replacement and set the manifest's deletion policy to RETAIN
(manifest.node.defaultChild as CfnResource).applyRemovalPolicy(RemovalPolicy.RETAIN);
- Remove the manifest from CDK entirely, deploy, then add it back
Note: Using RemovalPolicy.RETAIN
comes with the natural downside of having to clean up dangling resources manually
Additional Information/Context
Additional Risks:
If we update manifests and there is any overlap between the original and subsequent manifests, CloudFormation might invisibly delete parts of a manifest. For example, if manifest version 1.0 is deployed and replaced with manifest version 2.0, the intersecting resources (1.0 ∩ 2.0) will be deleted when cleaning up 1.0.
CDK CLI Version
2.160.0
Framework Version
No response
Node.js Version
18
OS
Amazon Linux 2 x86_64
Language
TypeScript
Language Version
5.0.4
Other information
Sev2: P199049085
Tracking: P200043360
Case ID 173931643600782