Better support for BYO Infrastructure#1741
Better support for BYO Infrastructure#1741kon-angelo wants to merge 28 commits intogardener:masterfrom
Conversation
…proposal Introduce Bring Your Own Infrastructure (BYOI) support at the API level: - Add WorkersSubnetID (per zone) and NodesSecurityGroupID (per cluster) fields to allow users to reference pre-provisioned worker subnets and security groups instead of having Gardener create them. - Internal/Public CIDRs are now optional (LB subnets are discovered via standard AWS tags by CCM/LBC, matching EKS behavior). - Validation enforces: XOR for workers/workersSubnetID, no mixed BYO/managed zones, VPC.ID required for BYO fields, immutability. - ConfigValidator: conditional IGW check (only when Gardener manages public subnets), BYO subnet existence/VPC/AZ validation, BYO SG existence/VPC validation. - Comprehensive proposal document with design rationale.
Implement the BYO infrastructure flow logic: Reconcile: - ensureExistingVpc: skip IGW lookup when no managed public subnets - ensureNodesSecurityGroup: when NodesSecurityGroupID is set, store it directly in state and skip SG creation/rule management - buildReconcileGraph: main route table task is conditional on needing an IGW (managed public subnets or Gardener-created VPC) - ensureZones: split into ensureBYOZones (stores WorkersSubnetID in state, skips all subnet/NAT/route-table/EIP creation) and ensureManagedZones (existing behavior) Delete: - deleteNodesSecurityGroup: skip deletion when BYO SG - deleteZones: for BYO, clear state only (don't delete user subnets) - All other delete functions already guard on state presence, so resources never stored (IGW, main route table, NAT, EIP) are naturally skipped Note: CCM config SubnetID handling for BYO (no public subnets) is deferred - requires design decision on how to provide the SubnetID that the CCM needs for external master mode detection.
- configvalidator: add validatePublicSubnetAvailability runtime check. When no zone has a public CIDR, queries AWS for existing subnets tagged with kubernetes.io/role/elb=1 and the cluster tag. Documents the CCM external-master-mode SubnetID dependency. - valuesprovider: document why FindSubnetForPurpose(PurposePublic) is needed (CCM init gate, never used at runtime). - ensureNodesSecurityGroup: skip per-zone CIDR rules when internal/ public CIDRs are empty (BYO mode). Base rules (self-referencing, NodePort from 0.0.0.0/0, all-egress) are still applied. - ensureSubnetCidrReservation: guard against missing IPv6 CIDR blocks on BYO subnets that may not have IPv6 configured.
…ive) Document two security group approaches: Preferred (additive): Gardener always creates a base SG with known-good rules (self-referencing, NodePort 0.0.0.0/0, all-egress). Users optionally attach additional SGs alongside via additionalNodesSecurityGroupIDs. Both are placed on the ENI. Advantages: separation of concerns, no broken clusters from missing rules, matches EKS pattern. Alternative (full replacement): User provides a single SG that replaces Gardener's entirely. More powerful (can restrict rules) but riskier (user can break pod-to-pod). Documented as a future extension if additive proves insufficient. No code changes - proposal document only.
- Make Zone.Internal, Zone.Public, Zone.Workers *string (proper optional semantics). Fix all consumers to use nil checks and dereferences. Fix zone comparison to use apiequality.Semantic.DeepEqual instead of == (pointer comparison would break after DeepCopy). - Validation: add aggregate check that at least one zone has either Workers or WorkersSubnetID (catches both-false case). - ensureBYOZones: discover tagged public (kubernetes.io/role/elb=1) and internal (kubernetes.io/role/internal-elb=1) subnets in the VPC and store them in state. These flow into InfrastructureStatus for CCM config and other consumers. - CCM config: fallback chain public > internal > workers subnet for SubnetID init gate. Disable CCM service controller (--controllers=*,-service) when no public or internal LB subnets are available. - IPv6: runtime validation in configvalidator that BYO worker subnets have IPv6 CIDR blocks when DualStack is enabled.
Introduce proper cluster tag value semantics:
- TagValueClusterOwned ('owned'): for Gardener-managed resources.
Aligns with the CCM's hasClusterTagOwned check for safe deletion.
- TagValueClusterShared ('shared'): for BYO resources that Gardener
auto-tags during reconcile. No controller will delete shared resources.
- TagValueCluster ('1'): kept for backwards compatibility. Gardener's
own ownership checks treat '1' as equivalent to 'owned'.
Changes:
- commonTags and clusterTags() now use 'owned' for new resources
- ensureBYOZones auto-tags BYO worker subnets with 'shared' (EKS pattern)
- All discovery filters use tag-key existence (not exact value match)
so they find resources tagged with '1', 'owned', or 'shared'
- delete.go ownership check accepts both '1' and 'owned'
- configvalidator discovery uses tag-key filter
Verified no breakage in:
- aws-custom-route-controller: hasClusterTag checks key only, ignores value
- aws-ipam-controller: does not use cluster tags at all
- cloud-provider-aws CCM: hasClusterTag checks key only for discovery;
only hasClusterTagOwned checks value (for NLB SG deletion)
- Add validateBYOSubnetRouteTables to configvalidator that checks BYO worker subnets have explicit route table associations - Add 'Routing Requirements for BYO' section to the GEP documenting: connectivity requirements, connectivity models (NAT/TGW/VPCE), aws-custom-route-controller interaction with pod CIDR routes, route table tagging requirements when overlay is disabled - Add routing FAQ entry about route table tagging
…table associations) VPC gateway endpoints are VPC-level resources that require route table associations to function. In BYO mode, Gardener does not create or manage route tables, so endpoints cannot be associated and would be inert. - Guard ensureGatewayEndpoints with DoIf(!isBYO) in reconcile graph - Guard deleteGatewayEndpoints with DoIf(!isBYO) in delete graph - Add validation: gatewayEndpoints forbidden when workersSubnetID is set - Add test for the new validation - Update doc: gateway endpoints are skipped in BYO mode, users must create and manage their own VPC endpoints independently
- Add new Context 'with BYO infrastructure (workersSubnetID +
nodesSecurityGroupID)' testing the full reconcile + delete lifecycle:
- Pre-creates VPC, worker subnet (in specific AZ), public LB subnet
(tagged for CCM discovery), and nodes security group
- Passes workersSubnetID + nodesSecurityGroupID to InfrastructureConfig
- Verifies: correct InfrastructureStatus (VPC, subnet, SG), cluster
tag on BYO subnet (shared), no NAT gateways/EIPs/VPC endpoints
created, IAM resources created
- On delete: verifies BYO resources NOT deleted, cluster tag removed
from BYO subnet, IAM resources cleaned up
- Add CreateSubnetInZone helper (extends CreateSubnet with AZ param)
|
@kon-angelo: The label(s) DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
…at, deduplicate open questions
…, extract shared validator
…n, and tag lifecycle
…auto-tagging lifecycle
|
/test |
|
@kon-angelo: No presubmit jobs available for gardener/gardener-extension-provider-aws@master DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Testrun: e2e-qkqxr +---------------------+---------------------+-----------+----------+ | NAME | STEP | PHASE | DURATION | +---------------------+---------------------+-----------+----------+ | infrastructure-test | infrastructure-test | Failed | 42m8s | | backupbucket-test | backupbucket-test | Succeeded | 7m2s | | bastion-test | bastion-test | Succeeded | 7m1s | | dnsrecord-test | dnsrecord-test | Succeeded | 9m59s | +---------------------+---------------------+-----------+----------+ |
|
This code change implements a comprehensive "Bring Your Own Infrastructure" (BYOI) feature for the AWS provider in Gardener, allowing users to deploy Kubernetes clusters using pre-existing AWS infrastructure components like VPCs, subnets, and security groups. New Feature: Flexible Network Configuration (BYOI)Walkthrough
Model: claude-sonnet-4-20250514 | Prompt Tokens: 61144 | Completion Tokens: 301 |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
How to categorize this PR?
/area control-plane
/kind enhancement
/platform aws
What this PR does / why we need it:
Is an first implementation of the proposal in #1715
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Release note: