/kind bug
What steps did you take and what happened:
-
Provisioned a CAPA-managed EKS cluster with a dynamically created VPC.
-
CAPA created the VPC, which in turn auto-created the default Network ACL (NACL).
-
After cluster provisioning, a manual change was made in the AWS console to the default NACL:
- Modified inbound rules to allow TCP-only traffic.
- This effectively blocked UDP traffic.
-
As a result:
- DNS resolution failed (UDP/53 blocked).
- Cluster components experienced STS/proxy timeouts.
- General cluster networking was disrupted.
-
CAPA did not detect or reconcile the NACL changes.
-
The cluster remained in a broken networking state until manual correction.
Current behavior:
- CAPA does not manage or reconcile default NACL rules.
- Manual console modifications introduced infrastructure drift.
- CAPA did not revert the changes (appears to be expected behavior today).
What did you expect to happen:
Unclear whether this falls within CAPA’s intended scope, but seeking clarification on expected behavior:
Either:
- CAPA should reconcile default NACL rules for dynamically created VPCs to prevent destructive drift,
OR
- NACL configuration is considered out-of-scope (too low-level / primitive networking), and governance/IAM controls should prevent such manual changes.
The main question is whether this is:
- A CAPA reconciliation gap, or
- An infrastructure governance issue (IAM restrictions / external IaC enforcement).
Anything else you would like to add:
Context:
- The VPC was created dynamically by CAPA.
- The default NACL was automatically created by AWS.
- CAPA currently does not appear to manage NACL resources.
- Manual modification of the default NACL caused cluster-wide networking failure (DNS outage → cascading failures).
Requesting guidance from maintainers on intended design boundaries:
- Should CAPA reconcile networking primitives like NACLs for VPCs it creates?
- Or are NACLs intentionally considered outside CAPA’s reconciliation scope?
Understanding this boundary will help determine whether to:
- Propose a feature enhancement, or
- Treat this strictly as governance/IAM enforcement responsibility.
Environment:
- Cluster-api-provider-aws version:
- Kubernetes version: (use
kubectl version):
- OS (e.g. from
/etc/os-release):
/kind bug
What steps did you take and what happened:
Provisioned a CAPA-managed EKS cluster with a dynamically created VPC.
CAPA created the VPC, which in turn auto-created the default Network ACL (NACL).
After cluster provisioning, a manual change was made in the AWS console to the default NACL:
As a result:
CAPA did not detect or reconcile the NACL changes.
The cluster remained in a broken networking state until manual correction.
Current behavior:
What did you expect to happen:
Unclear whether this falls within CAPA’s intended scope, but seeking clarification on expected behavior:
Either:
OR
The main question is whether this is:
Anything else you would like to add:
Context:
Requesting guidance from maintainers on intended design boundaries:
Understanding this boundary will help determine whether to:
Environment:
kubectl version):/etc/os-release):