Skip to content

Better support for BYO Infrastructure#1741

Draft
kon-angelo wants to merge 28 commits intogardener:masterfrom
kon-angelo:byo-subnet2
Draft

Better support for BYO Infrastructure#1741
kon-angelo wants to merge 28 commits intogardener:masterfrom
kon-angelo:byo-subnet2

Conversation

@kon-angelo
Copy link
Copy Markdown
Contributor

@kon-angelo kon-angelo commented Mar 25, 2026

How to categorize this PR?

/area control-plane
/kind enhancement
/platform aws

What this PR does / why we need it:

Is an first implementation of the proposal in #1715

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Release note:

Adds capability to deploy shoot in existing subnet and provides better support for BYO infrastructure scenarios such user route-tables, security groups, ingress and egress traffic control.

…proposal

Introduce Bring Your Own Infrastructure (BYOI) support at the API level:

- Add WorkersSubnetID (per zone) and NodesSecurityGroupID (per cluster)
  fields to allow users to reference pre-provisioned worker subnets and
  security groups instead of having Gardener create them.
- Internal/Public CIDRs are now optional (LB subnets are discovered via
  standard AWS tags by CCM/LBC, matching EKS behavior).
- Validation enforces: XOR for workers/workersSubnetID, no mixed
  BYO/managed zones, VPC.ID required for BYO fields, immutability.
- ConfigValidator: conditional IGW check (only when Gardener manages
  public subnets), BYO subnet existence/VPC/AZ validation, BYO SG
  existence/VPC validation.
- Comprehensive proposal document with design rationale.
Implement the BYO infrastructure flow logic:

Reconcile:
- ensureExistingVpc: skip IGW lookup when no managed public subnets
- ensureNodesSecurityGroup: when NodesSecurityGroupID is set, store it
  directly in state and skip SG creation/rule management
- buildReconcileGraph: main route table task is conditional on needing
  an IGW (managed public subnets or Gardener-created VPC)
- ensureZones: split into ensureBYOZones (stores WorkersSubnetID in
  state, skips all subnet/NAT/route-table/EIP creation) and
  ensureManagedZones (existing behavior)

Delete:
- deleteNodesSecurityGroup: skip deletion when BYO SG
- deleteZones: for BYO, clear state only (don't delete user subnets)
- All other delete functions already guard on state presence, so
  resources never stored (IGW, main route table, NAT, EIP) are
  naturally skipped

Note: CCM config SubnetID handling for BYO (no public subnets) is
deferred - requires design decision on how to provide the SubnetID
that the CCM needs for external master mode detection.
- configvalidator: add validatePublicSubnetAvailability runtime check.
  When no zone has a public CIDR, queries AWS for existing subnets
  tagged with kubernetes.io/role/elb=1 and the cluster tag. Documents
  the CCM external-master-mode SubnetID dependency.

- valuesprovider: document why FindSubnetForPurpose(PurposePublic) is
  needed (CCM init gate, never used at runtime).

- ensureNodesSecurityGroup: skip per-zone CIDR rules when internal/
  public CIDRs are empty (BYO mode). Base rules (self-referencing,
  NodePort from 0.0.0.0/0, all-egress) are still applied.

- ensureSubnetCidrReservation: guard against missing IPv6 CIDR blocks
  on BYO subnets that may not have IPv6 configured.
…ive)

Document two security group approaches:

Preferred (additive): Gardener always creates a base SG with
known-good rules (self-referencing, NodePort 0.0.0.0/0, all-egress).
Users optionally attach additional SGs alongside via
additionalNodesSecurityGroupIDs. Both are placed on the ENI.
Advantages: separation of concerns, no broken clusters from missing
rules, matches EKS pattern.

Alternative (full replacement): User provides a single SG that
replaces Gardener's entirely. More powerful (can restrict rules)
but riskier (user can break pod-to-pod). Documented as a future
extension if additive proves insufficient.

No code changes - proposal document only.
- Make Zone.Internal, Zone.Public, Zone.Workers *string (proper
  optional semantics). Fix all consumers to use nil checks and
  dereferences. Fix zone comparison to use apiequality.Semantic.DeepEqual
  instead of == (pointer comparison would break after DeepCopy).

- Validation: add aggregate check that at least one zone has either
  Workers or WorkersSubnetID (catches both-false case).

- ensureBYOZones: discover tagged public (kubernetes.io/role/elb=1)
  and internal (kubernetes.io/role/internal-elb=1) subnets in the
  VPC and store them in state. These flow into InfrastructureStatus
  for CCM config and other consumers.

- CCM config: fallback chain public > internal > workers subnet for
  SubnetID init gate. Disable CCM service controller (--controllers=*,-service)
  when no public or internal LB subnets are available.

- IPv6: runtime validation in configvalidator that BYO worker subnets
  have IPv6 CIDR blocks when DualStack is enabled.
Introduce proper cluster tag value semantics:
- TagValueClusterOwned ('owned'): for Gardener-managed resources.
  Aligns with the CCM's hasClusterTagOwned check for safe deletion.
- TagValueClusterShared ('shared'): for BYO resources that Gardener
  auto-tags during reconcile. No controller will delete shared resources.
- TagValueCluster ('1'): kept for backwards compatibility. Gardener's
  own ownership checks treat '1' as equivalent to 'owned'.

Changes:
- commonTags and clusterTags() now use 'owned' for new resources
- ensureBYOZones auto-tags BYO worker subnets with 'shared' (EKS pattern)
- All discovery filters use tag-key existence (not exact value match)
  so they find resources tagged with '1', 'owned', or 'shared'
- delete.go ownership check accepts both '1' and 'owned'
- configvalidator discovery uses tag-key filter

Verified no breakage in:
- aws-custom-route-controller: hasClusterTag checks key only, ignores value
- aws-ipam-controller: does not use cluster tags at all
- cloud-provider-aws CCM: hasClusterTag checks key only for discovery;
  only hasClusterTagOwned checks value (for NLB SG deletion)
- Add validateBYOSubnetRouteTables to configvalidator that checks BYO
  worker subnets have explicit route table associations
- Add 'Routing Requirements for BYO' section to the GEP documenting:
  connectivity requirements, connectivity models (NAT/TGW/VPCE),
  aws-custom-route-controller interaction with pod CIDR routes,
  route table tagging requirements when overlay is disabled
- Add routing FAQ entry about route table tagging
…table associations)

VPC gateway endpoints are VPC-level resources that require route table
associations to function. In BYO mode, Gardener does not create or manage
route tables, so endpoints cannot be associated and would be inert.

- Guard ensureGatewayEndpoints with DoIf(!isBYO) in reconcile graph
- Guard deleteGatewayEndpoints with DoIf(!isBYO) in delete graph
- Add validation: gatewayEndpoints forbidden when workersSubnetID is set
- Add test for the new validation
- Update doc: gateway endpoints are skipped in BYO mode, users must
  create and manage their own VPC endpoints independently
- Add new Context 'with BYO infrastructure (workersSubnetID +
  nodesSecurityGroupID)' testing the full reconcile + delete lifecycle:
  - Pre-creates VPC, worker subnet (in specific AZ), public LB subnet
    (tagged for CCM discovery), and nodes security group
  - Passes workersSubnetID + nodesSecurityGroupID to InfrastructureConfig
  - Verifies: correct InfrastructureStatus (VPC, subnet, SG), cluster
    tag on BYO subnet (shared), no NAT gateways/EIPs/VPC endpoints
    created, IAM resources created
  - On delete: verifies BYO resources NOT deleted, cluster tag removed
    from BYO subnet, IAM resources cleaned up

- Add CreateSubnetInZone helper (extends CreateSubnet with AZ param)
@kon-angelo kon-angelo requested a review from a team as a code owner March 25, 2026 01:08
@gardener-prow
Copy link
Copy Markdown

gardener-prow bot commented Mar 25, 2026

@kon-angelo: The label(s) area/todo, kind/todo cannot be applied, because the repository doesn't have them.

Details

In response to this:

How to categorize this PR?

/area TODO
/kind TODO
/platform aws

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Release note:


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@gardener-prow
Copy link
Copy Markdown

gardener-prow bot commented Mar 25, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hebelsan for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 25, 2026
@federated-github-access federated-github-access bot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Mar 25, 2026
@kon-angelo kon-angelo changed the title Better support for B Better support for BYO Infrastructure Mar 25, 2026
@kon-angelo kon-angelo marked this pull request as draft March 25, 2026 01:10
@gardener-prow gardener-prow bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. labels Mar 25, 2026
@kon-angelo
Copy link
Copy Markdown
Contributor Author

/test

@gardener-prow
Copy link
Copy Markdown

gardener-prow bot commented Mar 30, 2026

@kon-angelo: No presubmit jobs available for gardener/gardener-extension-provider-aws@master

Details

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@testmachinery
Copy link
Copy Markdown

testmachinery bot commented Mar 30, 2026

Testrun: e2e-qkqxr
Workflow: e2e-qkqxr-wf
Phase: Failed

+---------------------+---------------------+-----------+----------+
|        NAME         |        STEP         |   PHASE   | DURATION |
+---------------------+---------------------+-----------+----------+
| infrastructure-test | infrastructure-test | Failed    | 42m8s    |
| backupbucket-test   | backupbucket-test   | Succeeded | 7m2s     |
| bastion-test        | bastion-test        | Succeeded | 7m1s     |
| dnsrecord-test      | dnsrecord-test      | Succeeded | 9m59s    |
+---------------------+---------------------+-----------+----------+

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

This code change implements a comprehensive "Bring Your Own Infrastructure" (BYOI) feature for the AWS provider in Gardener, allowing users to deploy Kubernetes clusters using pre-existing AWS infrastructure components like VPCs, subnets, and security groups.

New Feature: Flexible Network Configuration (BYOI)

Walkthrough

  • New Feature: Added support for bringing your own AWS infrastructure components including VPCs, worker subnets, security groups, and load balancer subnets, enabling enterprise customers to deploy clusters into pre-provisioned network infrastructure
  • New Feature: Implemented auto-tagging of user-provided subnets with cluster and role tags for AWS Load Balancer Controller discovery while preserving shared infrastructure tags during deletion
  • New Feature: Added validation for BYO subnet IDs and security groups to ensure they exist in the correct VPC and availability zones
  • New Feature: Enhanced Cloud Controller Manager configuration to handle scenarios without public subnets by using internal or worker subnets as fallback and disabling service controller when no load balancer subnets are available
  • Refactor: Updated API types to make subnet CIDRs optional and added new fields for referencing existing subnet IDs and security group IDs
  • Documentation: Added comprehensive documentation explaining BYO configuration patterns, validation rules, security group requirements, and routing considerations

Model: claude-sonnet-4-20250514 | Prompt Tokens: 61144 | Completion Tokens: 301

@gardener-prow gardener-prow bot added cla: no Indicates the PR's author has not signed the cla-assistant.io CLA. cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. cla: no Indicates the PR's author has not signed the cla-assistant.io CLA. labels Apr 7, 2026
@gardener-prow
Copy link
Copy Markdown

gardener-prow bot commented Apr 13, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant