Skip to content

CAPA does not reconcile default NACL drift leading to cluster networking failure #5872

@pavansokkenagaraj

Description

@pavansokkenagaraj

/kind bug

What steps did you take and what happened:

  1. Provisioned a CAPA-managed EKS cluster with a dynamically created VPC.

  2. CAPA created the VPC, which in turn auto-created the default Network ACL (NACL).

  3. After cluster provisioning, a manual change was made in the AWS console to the default NACL:

    • Modified inbound rules to allow TCP-only traffic.
    • This effectively blocked UDP traffic.
  4. As a result:

    • DNS resolution failed (UDP/53 blocked).
    • Cluster components experienced STS/proxy timeouts.
    • General cluster networking was disrupted.
  5. CAPA did not detect or reconcile the NACL changes.

  6. The cluster remained in a broken networking state until manual correction.

Current behavior:

  • CAPA does not manage or reconcile default NACL rules.
  • Manual console modifications introduced infrastructure drift.
  • CAPA did not revert the changes (appears to be expected behavior today).

What did you expect to happen:

Unclear whether this falls within CAPA’s intended scope, but seeking clarification on expected behavior:

Either:

  • CAPA should reconcile default NACL rules for dynamically created VPCs to prevent destructive drift,
    OR
  • NACL configuration is considered out-of-scope (too low-level / primitive networking), and governance/IAM controls should prevent such manual changes.

The main question is whether this is:

  • A CAPA reconciliation gap, or
  • An infrastructure governance issue (IAM restrictions / external IaC enforcement).

Anything else you would like to add:

Context:

  • The VPC was created dynamically by CAPA.
  • The default NACL was automatically created by AWS.
  • CAPA currently does not appear to manage NACL resources.
  • Manual modification of the default NACL caused cluster-wide networking failure (DNS outage → cascading failures).

Requesting guidance from maintainers on intended design boundaries:

  • Should CAPA reconcile networking primitives like NACLs for VPCs it creates?
  • Or are NACLs intentionally considered outside CAPA’s reconciliation scope?

Understanding this boundary will help determine whether to:

  • Propose a feature enhancement, or
  • Treat this strictly as governance/IAM enforcement responsibility.

Environment:

  • Cluster-api-provider-aws version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-priorityneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions