Skip to content

Enhanced Subnet Discovery#3380

Closed
dshehbaj wants to merge 17 commits intomasterfrom
shehbaj/enhanced-subnet-discovery-rebase
Closed

Enhanced Subnet Discovery#3380
dshehbaj wants to merge 17 commits intomasterfrom
shehbaj/enhanced-subnet-discovery-rebase

Conversation

@dshehbaj
Copy link
Copy Markdown
Contributor

@dshehbaj dshehbaj commented Jul 30, 2025

What type of PR is this?
feature

Which issue does this PR fix?:

#3067
#2904

What does this PR do / Why do we need it?:

This PR adds a requested customer feature to allow customers to

  1. Include or exclude subnets/security groups by tagging them kubernetes.io/role/cni=1 or kubernetes.io/role/cni=0 respectively.
  2. Apply custom security groups for secondary ENIs created using kubernetes.io/role/cni=1 tagged subnet if the alternate security group is also tagged kubernetes.io/role/cni=1.
  3. Reserve subnets for specific cluster based on the kubernetes.io/cluster/<my-example-cluster> tag. See Enhanced subnet discovery should use configurable tags #2904 more.

Testing done on this change:

  1. Added unit tests and integration tests.
  2. Manual testing pending.

Will this PR introduce any new dependencies?:

Will this break upgrades or downgrades? Has updating a running cluster been tested?:

  • Manual testing pending.

Does this change require updates to the CNI daemonset config files to work?:

  • Requires ENABLE_SUBNET_DISCOVERY enabled which is already done by default.

Does this PR introduce any user-facing change?:

  • Users were already tagging their subnets as the part of the original subnet discovery feature. This enhanced feature will also allow users to tag alternate security groups.

  • Make sure that IPv6 documentation and managed IPv4 policy are updated.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Integration tests passing in IPv4 cluster.

image

Currently adapting our test framework to work with IPv6 cluster. Will paste results here.

@dshehbaj dshehbaj requested a review from a team as a code owner July 30, 2025 16:37
@dshehbaj dshehbaj force-pushed the shehbaj/enhanced-subnet-discovery-rebase branch from 96310d6 to 0a64e10 Compare July 30, 2025 20:54
@dshehbaj dshehbaj changed the title pkg awsutils: check for primary/secondary subnet exclusion Enhanced Subnet Discovery Jul 30, 2025
Comment thread pkg/ipamd/ipamd.go Outdated
// Mark primary ENI as excluded if primary subnet is excluded
if eni == primaryENI && c.isPrimarySubnetExcluded {
log.Infof("Marking primary ENI %s as excluded from pod IP allocation", eni)
err := c.dataStoreAccess.GetDataStore(eniMetadata.NetworkCard).SetENIExcludedForPodIPs(eni, true)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if eniMetadata.NetworkCard is the same network card as 0.

@dshehbaj dshehbaj force-pushed the shehbaj/enhanced-subnet-discovery-rebase branch 6 times, most recently from f1898c7 to 7945b5a Compare August 4, 2025 14:56
Comment on lines +2786 to +2809
name: "IPv6 enabled with subnet discovery and mixed IPv4/IPv6 subnets",
subnets: []ec2types.Subnet{
Copy link
Copy Markdown
Contributor Author

@dshehbaj dshehbaj Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check edge case in code if customer is on IPv4 cluster and the code picks up an IPv6 subnet because that's all they had tagged in the console and vice versa.

@dshehbaj dshehbaj force-pushed the shehbaj/enhanced-subnet-discovery-rebase branch 6 times, most recently from 903a226 to 0368ae0 Compare August 4, 2025 16:31
Comment thread pkg/awsutils/awsutils.go Outdated
Comment thread pkg/awsutils/awsutils.go Outdated
Comment thread pkg/awsutils/awsutils.go
Comment on lines +1239 to +1248
// When subnet discovery is disabled, check if primary subnet is excluded
excluded, checkErr := cache.isPrimarySubnetExcluded()
if checkErr != nil {
// If we can't determine exclusion status, log warning and proceed
log.Warnf("Failed to check if primary subnet is excluded: %v. Proceeding with ENI creation attempt.", checkErr)
} else if excluded {
// Primary subnet is explicitly excluded
err = errors.New("primary subnet is tagged with kubernetes.io/role/cni=0 and subnet discovery is disabled - no valid subnets available for ENI creation")
log.Error(err.Error())
return "", err
}
Copy link
Copy Markdown
Contributor Author

@dshehbaj dshehbaj Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks if Cx has a tagged subnet, but have not enable subnet discovery. Do we want to do this check?

Comment thread pkg/ipamd/ipamd.go Outdated
Comment on lines +465 to +472
if c.enableIPv6 {
assignedIPs = primaryENIInfo.AssignedIPv6Addresses()
ipType = "IPv6"
} else {
assignedIPs = primaryENIInfo.AssignedIPv4Addresses()
ipType = "IPv4"
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if we can use hasPods() function.

Currently hasPods() only checks for IPv4 assigned addresses, will need to update that func to support IPv6.

Comment on lines +1066 to +1119
// Skip ENIs that are excluded from pod IP allocation
if eni.IsExcludedForPodIPs {
ds.log.Debugf("Skip needs IP check for ENI %s as it is excluded from pod IP allocation", eni.ID)
continue
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will skip the primary ENI in GetAllocatableENIs function and prevent it from getting IPs assigned/allocated

@dshehbaj dshehbaj force-pushed the shehbaj/enhanced-subnet-discovery-rebase branch from 0368ae0 to ea0763f Compare August 6, 2025 17:47
@dshehbaj dshehbaj requested a review from haouc August 6, 2025 18:15
Comment thread pkg/ipamd/ipamd_test.go Outdated
assert.False(t, wasUsed, "Primary ENI should be excluded and auto-cleanup prefixes")
}

func TestPrimaryENIAutoCleanupFailureHandling(t *testing.T) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add case for IPv6 mode too

@dshehbaj dshehbaj force-pushed the shehbaj/enhanced-subnet-discovery-rebase branch 3 times, most recently from 587b9f3 to ea0763f Compare August 8, 2025 18:11
Comment thread pkg/awsutils/awsutils.go
describeSGInput := &ec2.DescribeSecurityGroupsInput{
Filters: []ec2types.Filter{
{
Name: aws.String("vpc-id"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need support global SGs which could be in another VPCs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we support VPC peering in CNI. I could not find any existing logic that I could leverage.

I think keeping it local VPC only might keep things simpler, and we could do VPC peering as a follow up.

What do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global SGs is not VPC peering. EC2 support using SG from another VPC.

Comment thread pkg/awsutils/awsutils.go Outdated
},
{
Name: aws.String("tag:" + subnetDiscoveryTagKey),
Values: []string{"1"},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's make them to const, such as selected = 1.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@dshehbaj dshehbaj force-pushed the shehbaj/enhanced-subnet-discovery-rebase branch 3 times, most recently from 9036276 to a188827 Compare August 18, 2025 16:30
@dshehbaj dshehbaj force-pushed the shehbaj/enhanced-subnet-discovery-rebase branch 2 times, most recently from f0bc83a to 1d749c9 Compare August 21, 2025 02:11
@dshehbaj dshehbaj force-pushed the shehbaj/enhanced-subnet-discovery-rebase branch from 8baff5a to 34be861 Compare August 21, 2025 11:07
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 2, 2025

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions
Copy link
Copy Markdown
Contributor

Pull request closed due to inactivity.

@github-actions github-actions Bot closed this Nov 17, 2025
haouc added a commit that referenced this pull request Feb 18, 2026
* pkg awsutils

* pkg ec2wrapper: add func to describe subnets

* pkg ipamd: handle primary ENI exclusion

* test: add integration test

* pkg: enable subnet discovery for IPv6 mode

* test integration eni-subnet-discovery: extend tests for IPv6 mode

* pkg: use shared context for awsutils and ipamd

* pkg awsutils: fix compilation error

* pkg ipamd: clean up ENI if not previously used

* pkg ipamd: gracefully exclude primary ENI if existing pods

* test integration: add integration test for primary ENI exclusion

* pkg ipamd: fix bug when calculating currentENIs when primary ENI is excluded

* pkg ipamd: implement secondary ENI exclusion logic

* pkg ipamd: add unit test for secondary ENI exclusion

* test integration: add integ test for secondary ENI exclusion testing

* test integration: add test for security group refresh

* rebase to latest mainline chnages

* remove value check for tags

* reviewed

* updated based on comments

* add more unit tests

* primary ENI can't be easily ignored

---------

Co-authored-by: Shehbaj Dhillon <dshehbaj@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Issue or PR is stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants