Skip to content

feat: Implement support of BYO SG for Service NLB#1326

Draft
mtulio wants to merge 6 commits intokubernetes:masterfrom
mtulio:feat-svc-nlb-byosg
Draft

feat: Implement support of BYO SG for Service NLB#1326
mtulio wants to merge 6 commits intokubernetes:masterfrom
mtulio:feat-svc-nlb-byosg

Conversation

@mtulio
Copy link
Contributor

@mtulio mtulio commented Jan 5, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

This proposal implements support of Bring Your Own Security Group (BYO SG) annotation to Service type-loadBalancer NLB.

The feature uses the same annotation of BYO SG to Classic Load Balancer, allowing users to create/update a Network Load Balancer with SGs attached to it.

Note: The update will work only when the NLB has been created with managed SG (TODO review this statement).

The BYOSG is an alternative of managed SG for NLBs, when the feature is enabled through cloud-config.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Introduce support of Annotation `service.beta.kubernetes.io/aws-load-balancer-security-groups` on the Service for Network Load Balancer, so that users can provide their own security groups.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Jan 5, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 5, 2026
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 5, 2026
Comment on lines +1909 to +1920
// buildSecurityGroupRuleReferences finds all security groups that have ingress rules
// referencing the specified security group ID. This is needed to safely delete security
// groups by first removing all dependent ingress rules.
//
// Parameters:
// - ctx: The context for AWS API calls
// - sgID: The security group ID to search for in ingress rules
//
// Returns:
// - map[*ec2types.SecurityGroup]bool: Map of security groups to cluster tag ownership status
// - map[*ec2types.SecurityGroup]IPPermissionSet: Map of security groups to their ingress rules that reference sgID
// - error: An error if the operation fails
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,236 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtulio
Copy link
Contributor Author

mtulio commented Jan 6, 2026

/test pull-cloud-provider-aws-test

@mtulio
Copy link
Contributor Author

mtulio commented Jan 6, 2026

/test pull-cloud-provider-aws-check

mtulio added 4 commits January 6, 2026 11:54
Introduce the documentation to use the feature Service type-LoadBalancer
with Security Group by opt-in through the cloud-config.

doc/nlb-sg: added instructions to use NLB+SG feature

Introduce the documentation to use the feature Service type-LoadBalancer
with Security Group by opt-in through the cloud-config.

doc(nlb byo-sg): update validation steps
Implement validation for Service annotation
`service.beta.kubernetes.io/aws-load-balancer-security-groups`
on NLBson NLBs.

Key changes:

1. Validation layer (aws_validations.go):
   - Removed blanket blocking of BYO SG annotation for NLB
   - Added NLB-specific constraints:
     * Only one security group allowed (NLB AWS limitation)
     * Security group ID must start with "sg-" (AWS format)
     * Extra security groups annotation remains blocked for NLB
Implement support of Service updates for BYO SG on NLBs, when
annotation is added, detecting drift and preventing leak of managed SG.

NOTE: BYO SG annotation can be added to NLB only when it is created with
managed SG support.

This commit fixes a security group leak vulnerability in NLB (Network Load
Balancer) when updating a service from controller-managed security groups
to BYO (Bring Your Own) security groups.

Problem:
When updating an NLB service to use BYO security groups via the
service.beta.kubernetes.io/aws-load-balancer-security-groups annotation,
the old managed security groups were not being deleted from AWS, causing
resource leaks.

Solution:
1. Added security group drift detection in ensureLoadBalancerv2()
   - Compares expected vs actual SGs on every reconciliation
   - Calls AWS SetSecurityGroups API when drift detected
   - Triggers cleanup of replaced managed security groups

2. Added helper function buildSecurityGroupRuleReferences()
   - Finds security groups with ingress rules referencing target SG
   - Identifies cluster-owned vs external security groups
   - Returns permissions that need revocation before deletion

3. Added helper function removeOwnedSecurityGroups()
   - Verifies SG ownership via cluster tags
   - Revokes dependent ingress rules from other SGs
   - Deletes owned SGs with exponential backoff retry

Testing:
- Added comprehensive unit tests (TestEnsureLoadBalancerv2_SecurityGroupUpdate)
This commit enables users to bring their own security groups (BYO SG)
for Network Load Balancers (NLB) through annotations or global config.

Key changes:

Implementation (aws.go):
   - Updated ensureNLBSecurityGroup() with priority-based SG selection:
     Priority 1: Existing NLB - return current SGs
     Priority 2: BYO SG from annotation
     Priority 3: BYO SG from global config (ElbSecurityGroup)
     Priority 4: Managed mode - create new managed SG
     Priority 5: None - return empty list
   - Added comprehensive logging for each decision path

3. Tests:
   - Updated validation tests with NLB BYO SG success/error cases
   - Fixed implementation tests to reflect new behavior
   - Removed duplicate validation tests from implementation layer

Supported configurations:
- Annotation: service.beta.kubernetes.io/aws-load-balancer-security-groups
- Global config: ElbSecurityGroup (fallback)
- Managed mode: NLBSecurityGroupMode=Managed (fallback)

All tests passing:
- TestValidateServiceAnnotations: PASS
- TestEnsureNLBSecurityGroup: PASS
@mtulio mtulio force-pushed the feat-svc-nlb-byosg branch from 59bba33 to 66728d1 Compare January 6, 2026 15:42
mtulio added 2 commits January 6, 2026 12:43
Ensure e2e tests for BYO SG scenario on create and update.
Introduce e2e aws helper to enhance e2e test setup with aws specific
operations, specially when simulating BYO scenarios, and validating
load balancer AWS resources.
@mtulio mtulio force-pushed the feat-svc-nlb-byosg branch from 66728d1 to 1e861e4 Compare January 6, 2026 15:44
@mtulio
Copy link
Contributor Author

mtulio commented Jan 6, 2026

Commits have been cleaned up.

/test all

@k8s-ci-robot
Copy link
Contributor

@mtulio: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cloud-provider-aws-e2e 1e861e4 link true /test pull-cloud-provider-aws-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@mtulio
Copy link
Contributor Author

mtulio commented Jan 6, 2026

I am observing two issues in this PR so far:

[It] [cloud-provider-aws-e2e] loadbalancer NLB with BYO SG - create with BYO from star

  1. e2e identity is missing permissions to scenario: create service with BYO SG on NLB:
I0106 16:39:58.845514 32245 loadbalancer.go:1082]   [Warning] SyncLoadBalancerFailed/lbconfig-test-nlb-byo-c: Error 
syncing load balancer: failed to ensure load balancer: error ensuring NLB security group rules: error while updating rules 
to security group "sg-0e755bbafb6239e3d": creating ingress rules for security group "sg-0e755bbafb6239e3d": error 
authorizing security group ingress: "operation error EC2: AuthorizeSecurityGroupIngress, https response error 
StatusCode: 403, RequestID: 0f08cf56-95bd-4b10-8ccc-cb6361739970, api error UnauthorizedOperation: 
You are not authorized to perform this operation. 
User: arn:aws:sts::209411653980:assumed-role/aws-cloud-controller-manager.kube-system.sa.test-cluster--iugf5b/1767716035453412230
 is not authorized to perform: ec2:AuthorizeSecurityGroupIngress on resource: arn:aws:ec2:us-west-2:209411653980:security-group/sg-0e755bbafb6239e3d 
because no identity-based policy allows the ec2:AuthorizeSecurityGroupIngress action. 

Action item: investigate why permission isn't added to CI since it is mentioned as pre-req.

[It] [cloud-provider-aws-e2e] loadbalancer NLB with BYO SG update - managed to BYO scenario

  1. update e2e for BYO SG NLB can't be implemented in existing e2e scenario the test setup does not support managed SG .

Proposed Action Itens: remove this test from upstream or implement cloud-config patch for managed SG scenario.

Copy link

@alebedev87 alebedev87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First look, mostly questions..

Comment on lines +205 to +211
6. Check if security group ingress rules has been updated
```sh
$ aws ec2 describe-security-group-rules --filters Name=group-id,Values=$SG_ID | jq .SecurityGroupRules[].ToPort
80
8888
..
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that unmanaged (BYO) frontend SG rules were not supposed to be managed (unlike managed frontend SG).

| service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled | [true\|false] | - | ELB | With cross-zone load balancing, each load balancer node for your Classic Load Balancer distributes requests evenly across the registered instances in all enabled Availability Zones. If cross-zone load balancing is disabled, each load balancer node distributes requests evenly across the registered instances in its Availability Zone only. |
| service.beta.kubernetes.io/aws-load-balancer-extra-security-groups | Comma-separated list | - | ELB | Specifies additional security groups to be added to ELB. |
| service.beta.kubernetes.io/aws-load-balancer-security-groups | Comma-separated list | - | ELB | Specifies the security groups to be added to ELB. Differently from the annotation "service.beta.kubernetes.io/aws-load-balancer-extra-security-groups", this replaces all other security groups previously assigned to the ELB. |
| service.beta.kubernetes.io/aws-load-balancer-security-groups | Comma-separated list | - | ELB,NLB | Specifies the security groups to be added to ELB and NLB. For ELB: differently from the annotation "service.beta.kubernetes.io/aws-load-balancer-extra-security-groups", this replaces all other security groups previously assigned to the ELB. For NLB, it is supported only when load balancer is created with managed security group. |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For NLB, it is supported only when load balancer is created with managed security group

I'm not sure I understand this part of the description. Isn't this annotations aims at setting unmanaged frontend security groups? Or you meant when NLBSecurityGroupMode flag is set to Managed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, technically is possible to create an NLB with BYO SGs without the managed configuration.

Initially this annotation was though to be used as a fallback mechanism to users who wanted to skip the managed LB created by controller, satisfying the NLB restriction of attaching SG only when it was created with a SG (day-2, managed SG).

SetSecurityGroups: You can't perform this operation on a Network Load Balancer unless you specified a security group for the load balancer when you created it.

I will keep this thread open to collect more thoughts and review the code if this restriction is hard coded in unmanaged mode.

Comment on lines +2273 to +2276
if len(sgList) == 1 {
klog.Infof("Using BYO security group %q for NLB service %q (from annotation)", sgList[0], serviceName)
return sgList, nil
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this limitation of only 1 SG?

// For new NLBs, determine security group based on configuration priority

// Priority 2: BYO Security Group from annotation
// Validation has already ensured this is exactly one valid SG ID (starts with "sg-")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we accept names too?

Comment on lines +230 to +234
// Unsupported operation by AWS: NLB udpate security groups:
// Prevent updating security groups for NLB if the load balancer is created without managed security group.
if len(loadBalancer.SecurityGroups) == 0 {
return nil, fmt.Errorf("cannot update security groups for NLB %q as it is created without managed security group", loadBalancerName)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this condition handle the case of load balancers created before the managed frontedn SG feature was implemented? If so, it means that upgrades to new controller won't enable BYO SG annotation. Can we change this? For instance by enabling NLBSecurityGroupMode after the upgrade?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NLBs can't be updated if it was not created with a Security Group:

SetSecurityGroups : You can't perform this operation on a Network Load Balancer unless you specified a security group for the load balancer when you created it.

I can't see a supported path in this situation, if you have something in mind, would you mind elaborate it, please?

}

if !isOwned {
klog.V(2).Infof("Skipping non-owned security group %q for load balancer %q", sg, loadBalancerName)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
klog.V(2).Infof("Skipping non-owned security group %q for load balancer %q", sg, loadBalancerName)
klog.V(2).Infof("Skipping removal of non-owned security group %q for load balancer %q", sg, loadBalancerName)

// NLB only: ensure the BYO annotations are not supported and return an error.
// FIXME: the BYO SG for NLB implementation is blocked by https://github.com/kubernetes/cloud-provider-aws/pull/1209
if _, hasBYOAnnotation := v.annotations[ServiceAnnotationLoadBalancerSecurityGroups]; hasBYOAnnotation {
// NLB: validate BYO SG constraints (only one SG, must start with "sg-")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any links or more details to explain why only one is allowed?

Comment on lines +63 to +67
// NLB supports only one security group
if len(sgList) > 1 {
return fmt.Errorf("NLB supports only one security group, got %d security groups %v in annotation %q",
len(sgList), sgList, ServiceAnnotationLoadBalancerSecurityGroups)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants