CAPA does not reconcile default NACL drift leading to cluster networking failure


/kind bug

**What steps did you take and what happened:**

1. Provisioned a CAPA-managed EKS cluster with a dynamically created VPC.
2. CAPA created the VPC, which in turn auto-created the default Network ACL (NACL).
3. After cluster provisioning, a manual change was made in the AWS console to the default NACL:

   * Modified inbound rules to allow **TCP-only** traffic.
   * This effectively blocked UDP traffic.
4. As a result:

   * DNS resolution failed (UDP/53 blocked).
   * Cluster components experienced STS/proxy timeouts.
   * General cluster networking was disrupted.
5. CAPA did not detect or reconcile the NACL changes.
6. The cluster remained in a broken networking state until manual correction.

Current behavior:

* CAPA does not manage or reconcile default NACL rules.
* Manual console modifications introduced infrastructure drift.
* CAPA did not revert the changes (appears to be expected behavior today).

---

**What did you expect to happen:**

Unclear whether this falls within CAPA’s intended scope, but seeking clarification on expected behavior:

Either:

* CAPA should reconcile default NACL rules for dynamically created VPCs to prevent destructive drift,
  OR
* NACL configuration is considered out-of-scope (too low-level / primitive networking), and governance/IAM controls should prevent such manual changes.

The main question is whether this is:

* A CAPA reconciliation gap, or
* An infrastructure governance issue (IAM restrictions / external IaC enforcement).

---

**Anything else you would like to add:**

Context:

* The VPC was created dynamically by CAPA.
* The default NACL was automatically created by AWS.
* CAPA currently does not appear to manage NACL resources.
* Manual modification of the default NACL caused cluster-wide networking failure (DNS outage → cascading failures).

Requesting guidance from maintainers on intended design boundaries:

* Should CAPA reconcile networking primitives like NACLs for VPCs it creates?
* Or are NACLs intentionally considered outside CAPA’s reconciliation scope?

Understanding this boundary will help determine whether to:

* Propose a feature enhancement, or
* Treat this strictly as governance/IAM enforcement responsibility.

---

**Environment:**

* Cluster-api-provider-aws version:
* Kubernetes version: (use `kubectl version`):
* OS (e.g. from `/etc/os-release`):


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAPA does not reconcile default NACL drift leading to cluster networking failure #5872

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CAPA does not reconcile default NACL drift leading to cluster networking failure #5872

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions