feat: Implement support of BYO SG for Service NLB#1326
feat: Implement support of BYO SG for Service NLB#1326mtulio wants to merge 6 commits intokubernetes:masterfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
This issue is currently awaiting triage. If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| // buildSecurityGroupRuleReferences finds all security groups that have ingress rules | ||
| // referencing the specified security group ID. This is needed to safely delete security | ||
| // groups by first removing all dependent ingress rules. | ||
| // | ||
| // Parameters: | ||
| // - ctx: The context for AWS API calls | ||
| // - sgID: The security group ID to search for in ingress rules | ||
| // | ||
| // Returns: | ||
| // - map[*ec2types.SecurityGroup]bool: Map of security groups to cluster tag ownership status | ||
| // - map[*ec2types.SecurityGroup]IPPermissionSet: Map of security groups to their ingress rules that reference sgID | ||
| // - error: An error if the operation fails |
There was a problem hiding this comment.
Those functions shares the implementation with https://github.com/kubernetes/cloud-provider-aws/pull/1209/files#diff-70a4ebdf516a65404dc3c3b3ba9925fc62d0f85bd75e272a4f8ddb4b7b8ab852R1885
| @@ -0,0 +1,236 @@ | |||
| /* | |||
There was a problem hiding this comment.
helpers implemented in https://github.com/kubernetes/cloud-provider-aws/pull/1209/files#diff-1d5c293aba66db82ccec7e1851eaa56aebb4f9ac3a0b7be682eaf75fe6034f25 (also waiting for review)
|
/test pull-cloud-provider-aws-test |
|
/test pull-cloud-provider-aws-check |
Introduce the documentation to use the feature Service type-LoadBalancer with Security Group by opt-in through the cloud-config. doc/nlb-sg: added instructions to use NLB+SG feature Introduce the documentation to use the feature Service type-LoadBalancer with Security Group by opt-in through the cloud-config. doc(nlb byo-sg): update validation steps
Implement validation for Service annotation
`service.beta.kubernetes.io/aws-load-balancer-security-groups`
on NLBson NLBs.
Key changes:
1. Validation layer (aws_validations.go):
- Removed blanket blocking of BYO SG annotation for NLB
- Added NLB-specific constraints:
* Only one security group allowed (NLB AWS limitation)
* Security group ID must start with "sg-" (AWS format)
* Extra security groups annotation remains blocked for NLB
Implement support of Service updates for BYO SG on NLBs, when annotation is added, detecting drift and preventing leak of managed SG. NOTE: BYO SG annotation can be added to NLB only when it is created with managed SG support. This commit fixes a security group leak vulnerability in NLB (Network Load Balancer) when updating a service from controller-managed security groups to BYO (Bring Your Own) security groups. Problem: When updating an NLB service to use BYO security groups via the service.beta.kubernetes.io/aws-load-balancer-security-groups annotation, the old managed security groups were not being deleted from AWS, causing resource leaks. Solution: 1. Added security group drift detection in ensureLoadBalancerv2() - Compares expected vs actual SGs on every reconciliation - Calls AWS SetSecurityGroups API when drift detected - Triggers cleanup of replaced managed security groups 2. Added helper function buildSecurityGroupRuleReferences() - Finds security groups with ingress rules referencing target SG - Identifies cluster-owned vs external security groups - Returns permissions that need revocation before deletion 3. Added helper function removeOwnedSecurityGroups() - Verifies SG ownership via cluster tags - Revokes dependent ingress rules from other SGs - Deletes owned SGs with exponential backoff retry Testing: - Added comprehensive unit tests (TestEnsureLoadBalancerv2_SecurityGroupUpdate)
This commit enables users to bring their own security groups (BYO SG)
for Network Load Balancers (NLB) through annotations or global config.
Key changes:
Implementation (aws.go):
- Updated ensureNLBSecurityGroup() with priority-based SG selection:
Priority 1: Existing NLB - return current SGs
Priority 2: BYO SG from annotation
Priority 3: BYO SG from global config (ElbSecurityGroup)
Priority 4: Managed mode - create new managed SG
Priority 5: None - return empty list
- Added comprehensive logging for each decision path
3. Tests:
- Updated validation tests with NLB BYO SG success/error cases
- Fixed implementation tests to reflect new behavior
- Removed duplicate validation tests from implementation layer
Supported configurations:
- Annotation: service.beta.kubernetes.io/aws-load-balancer-security-groups
- Global config: ElbSecurityGroup (fallback)
- Managed mode: NLBSecurityGroupMode=Managed (fallback)
All tests passing:
- TestValidateServiceAnnotations: PASS
- TestEnsureNLBSecurityGroup: PASS
59bba33 to
66728d1
Compare
Ensure e2e tests for BYO SG scenario on create and update.
Introduce e2e aws helper to enhance e2e test setup with aws specific operations, specially when simulating BYO scenarios, and validating load balancer AWS resources.
66728d1 to
1e861e4
Compare
|
Commits have been cleaned up. /test all |
|
@mtulio: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
I am observing two issues in this PR so far:
Action item: investigate why permission isn't added to CI since it is mentioned as pre-req.
Proposed Action Itens: remove this test from upstream or implement cloud-config patch for managed SG scenario. |
| 6. Check if security group ingress rules has been updated | ||
| ```sh | ||
| $ aws ec2 describe-security-group-rules --filters Name=group-id,Values=$SG_ID | jq .SecurityGroupRules[].ToPort | ||
| 80 | ||
| 8888 | ||
| .. | ||
| ``` |
There was a problem hiding this comment.
I was thinking that unmanaged (BYO) frontend SG rules were not supposed to be managed (unlike managed frontend SG).
| | service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled | [true\|false] | - | ELB | With cross-zone load balancing, each load balancer node for your Classic Load Balancer distributes requests evenly across the registered instances in all enabled Availability Zones. If cross-zone load balancing is disabled, each load balancer node distributes requests evenly across the registered instances in its Availability Zone only. | | ||
| | service.beta.kubernetes.io/aws-load-balancer-extra-security-groups | Comma-separated list | - | ELB | Specifies additional security groups to be added to ELB. | | ||
| | service.beta.kubernetes.io/aws-load-balancer-security-groups | Comma-separated list | - | ELB | Specifies the security groups to be added to ELB. Differently from the annotation "service.beta.kubernetes.io/aws-load-balancer-extra-security-groups", this replaces all other security groups previously assigned to the ELB. | | ||
| | service.beta.kubernetes.io/aws-load-balancer-security-groups | Comma-separated list | - | ELB,NLB | Specifies the security groups to be added to ELB and NLB. For ELB: differently from the annotation "service.beta.kubernetes.io/aws-load-balancer-extra-security-groups", this replaces all other security groups previously assigned to the ELB. For NLB, it is supported only when load balancer is created with managed security group. | |
There was a problem hiding this comment.
For NLB, it is supported only when load balancer is created with managed security group
I'm not sure I understand this part of the description. Isn't this annotations aims at setting unmanaged frontend security groups? Or you meant when NLBSecurityGroupMode flag is set to Managed?
There was a problem hiding this comment.
You are right, technically is possible to create an NLB with BYO SGs without the managed configuration.
Initially this annotation was though to be used as a fallback mechanism to users who wanted to skip the managed LB created by controller, satisfying the NLB restriction of attaching SG only when it was created with a SG (day-2, managed SG).
SetSecurityGroups: You can't perform this operation on a Network Load Balancer unless you specified a security group for the load balancer when you created it.
I will keep this thread open to collect more thoughts and review the code if this restriction is hard coded in unmanaged mode.
| if len(sgList) == 1 { | ||
| klog.Infof("Using BYO security group %q for NLB service %q (from annotation)", sgList[0], serviceName) | ||
| return sgList, nil | ||
| } |
| // For new NLBs, determine security group based on configuration priority | ||
|
|
||
| // Priority 2: BYO Security Group from annotation | ||
| // Validation has already ensured this is exactly one valid SG ID (starts with "sg-") |
| // Unsupported operation by AWS: NLB udpate security groups: | ||
| // Prevent updating security groups for NLB if the load balancer is created without managed security group. | ||
| if len(loadBalancer.SecurityGroups) == 0 { | ||
| return nil, fmt.Errorf("cannot update security groups for NLB %q as it is created without managed security group", loadBalancerName) | ||
| } |
There was a problem hiding this comment.
Does this condition handle the case of load balancers created before the managed frontedn SG feature was implemented? If so, it means that upgrades to new controller won't enable BYO SG annotation. Can we change this? For instance by enabling NLBSecurityGroupMode after the upgrade?
There was a problem hiding this comment.
NLBs can't be updated if it was not created with a Security Group:
SetSecurityGroups : You can't perform this operation on a Network Load Balancer unless you specified a security group for the load balancer when you created it.
I can't see a supported path in this situation, if you have something in mind, would you mind elaborate it, please?
| } | ||
|
|
||
| if !isOwned { | ||
| klog.V(2).Infof("Skipping non-owned security group %q for load balancer %q", sg, loadBalancerName) |
There was a problem hiding this comment.
| klog.V(2).Infof("Skipping non-owned security group %q for load balancer %q", sg, loadBalancerName) | |
| klog.V(2).Infof("Skipping removal of non-owned security group %q for load balancer %q", sg, loadBalancerName) |
| // NLB only: ensure the BYO annotations are not supported and return an error. | ||
| // FIXME: the BYO SG for NLB implementation is blocked by https://github.com/kubernetes/cloud-provider-aws/pull/1209 | ||
| if _, hasBYOAnnotation := v.annotations[ServiceAnnotationLoadBalancerSecurityGroups]; hasBYOAnnotation { | ||
| // NLB: validate BYO SG constraints (only one SG, must start with "sg-") |
There was a problem hiding this comment.
Any links or more details to explain why only one is allowed?
| // NLB supports only one security group | ||
| if len(sgList) > 1 { | ||
| return fmt.Errorf("NLB supports only one security group, got %d security groups %v in annotation %q", | ||
| len(sgList), sgList, ServiceAnnotationLoadBalancerSecurityGroups) | ||
| } |
There was a problem hiding this comment.
What type of PR is this?
/kind feature
What this PR does / why we need it:
This proposal implements support of Bring Your Own Security Group (BYO SG) annotation to Service type-loadBalancer NLB.
The feature uses the same annotation of BYO SG to Classic Load Balancer, allowing users to create/update a Network Load Balancer with SGs attached to it.
Note: The update will work only when the NLB has been created with managed SG (TODO review this statement).
The BYOSG is an alternative of managed SG for NLBs, when the feature is enabled through cloud-config.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: