Skip to content

Add prerouting connmark rules through nftable instead of iptables.#3588

Open
yash97 wants to merge 9 commits intoaws:masterfrom
yash97:nft_entry
Open

Add prerouting connmark rules through nftable instead of iptables.#3588
yash97 wants to merge 9 commits intoaws:masterfrom
yash97:nft_entry

Conversation

@yash97
Copy link
Copy Markdown
Contributor

@yash97 yash97 commented Feb 6, 2026

Solves #3563

What type of PR is this?

improvement
Which issue does this PR fix?:

#3563
What does this PR do / Why do we need it?:

Summary

This PR introduces a Connmark interface that abstracts connmark rule management, with implementations for both iptables-legacy and nftables backends. The VPC CNI now auto-detects the node's iptables mode and uses the appropriate backend.

Changes

Core Implementation:

  • Connmark interface with Setup() and Cleanup() methods

  • nftConnmark for nftables backend using new nft.Client wrapper

  • iptablesConnmark for iptables-legacy backend

  • GetIptablesMode() to auto-detect iptables backend (legacy vs nf_tables)

  • fib daddr type local return rule to skip connmark processing for local traffic (fixes IPVS compatibility)

nftables Implementation:

  • Create aws-cni table with nat-prerouting and snat-mark chains

  • Idempotent rule reconciliation - adds missing rules, removes stale CIDR rules

  • One-time cleanup of legacy iptables connmark rules when migrating to nftables

Testing:

  • Integration test snat_kube_proxy_test.go to verify SNAT works with all kube-proxy modes (iptables, nftables, ipvs)

  • PodExecInContainer() to test framework for multi-container pod exec

  • Test script run-mixed-os-snat-kube-proxy-test.sh for mixed OS testing

Testing done on this change:

Sample nft list rules, how does it look on host through nft command

nft list ruleset

(irrelevant output ommited)

table ip aws-cni {
        chain snat-mark {
                counter packets 9102174 bytes 836926787 ip daddr 172.31.0.0/20 return
                counter packets 9102174 bytes 836926787 ip daddr 192.168.0.0/16 return
                counter packets 0 bytes 0 ct mark set ct mark ^ 0x00000080
        }

        chain nat-prerouting {
                type nat hook prerouting priority dstnat + 10; policy accept;
                fib daddr type local return
                iifname "eni*" counter packets 0 bytes 0 jump snat-mark
                counter packets 0 bytes 0 meta mark set ct mark & 0x00000080
        }
}

Will this PR introduce any new dependencies?:

Will this break upgrades or downgrades? Has updating a running cluster been tested?:
It will not break pod networking however, it will leave nft rules on host. But they will apply same operations applied by iptables-nft. As nft rules will only be executed if iptables-nft is used by cni and kube-proxy.

Does this change require updates to the CNI daemonset config files to work?:

Does this PR introduce any user-facing change?:


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@yash97 yash97 changed the title Nft entry [WIP] Add prerouting connmark rules through nftable instead of iptables. Feb 6, 2026
@yash97 yash97 marked this pull request as ready for review February 24, 2026 07:13
@yash97 yash97 requested a review from a team as a code owner February 24, 2026 07:13
@yash97 yash97 force-pushed the nft_entry branch 4 times, most recently from 6144664 to 8ca4ea5 Compare February 24, 2026 21:32
@yash97 yash97 closed this Feb 24, 2026
@yash97 yash97 reopened this Feb 24, 2026
@yash97 yash97 force-pushed the nft_entry branch 2 times, most recently from 1ce4a25 to c14e4f2 Compare February 26, 2026 00:21
@yash97 yash97 changed the title [WIP] Add prerouting connmark rules through nftable instead of iptables. Add prerouting connmark rules through nftable instead of iptables. Feb 26, 2026
@jaydeokar jaydeokar requested a review from Copilot February 26, 2026 17:45
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses kube-proxy nftables mode compatibility by moving VPC CNI prerouting connmark logic into a dedicated connmark abstraction that can be implemented via either iptables-legacy or native nftables, selected automatically based on the node’s iptables backend.

Changes:

  • Introduces a Connmark interface and adds nftables- and iptables-based implementations for connmark rule setup/cleanup.
  • Updates host networking flow to program SNAT rules separately and delegate connmark rule management to the selected backend.
  • Adds integration/unit tests and supporting framework changes (including multi-container pod exec and kube-proxy mode SNAT validation).

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
test/integration/eni-subnet-discovery/eni_subnet_discovery_secondary_exclusion_test.go Switches to new PodExecInContainer helper for multi-container exec.
test/integration/cni/snat_kube_proxy_test.go New integration test validating SNAT/connmark behavior across kube-proxy modes.
test/framework/resources/k8s/resources/pod.go Renames/standardizes pod exec API to PodExecInContainer.
test/framework/framework.go Adds a DiscoveryClient used by integration tests.
scripts/run-mixed-os-snat-kube-proxy-test.sh New script intended to provision a mixed-OS EKS cluster and run SNAT/kube-proxy tests.
pkg/nft/nft.go Adds an nftables client wrapper interface for easier testing/mocking.
pkg/nft/mocks/nft_mocks.go Adds gomock for the nft client wrapper.
pkg/networkutils/nftables_connmark.go Implements nftables backend for connmark rules and legacy iptables cleanup-on-migration.
pkg/networkutils/nftables_connmark_test.go Unit tests for nftables connmark reconciliation logic.
pkg/networkutils/iptables_connmark.go Implements iptables backend for connmark rules.
pkg/networkutils/iptables_connmark_test.go Unit tests for iptables connmark behavior and stale rule cleanup.
pkg/networkutils/network.go Refactors host SNAT programming and integrates Connmark into host setup.
pkg/networkutils/network_test.go Updates unit tests to account for new connmark abstraction and SNAT update flow.
pkg/networkutils/mocks/network_mocks.go Updates generated mocks for renamed SNAT update API.
pkg/networkutils/mocks/connmark_mocks.go Adds gomock for Connmark.
pkg/iptableswrapper/iptables.go Adds GetIptablesMode() detection based on iptables --version.
pkg/iptableswrapper/mocks/iptables_maps.go Enhances mock iptables delete behavior to support IsNotExist semantics.
pkg/ipamd/ipamd.go Switches CIDR-change updates to call UpdateHostSNATRules.
go.mod / go.sum Adds nftables dependency and related indirect deps.
.github/workflows/pr-automated-tests.yaml Adjusts Docker build jobs to run per-arch via a matrix.
Comments suppressed due to low confidence (2)

pkg/networkutils/network_test.go:851

  • Test name looks like it has a typo: "TestUUpdateHostSNATRules". Renaming to something consistent (e.g., "TestUpdateHostSNATRules") will improve readability and discoverability in test output.
func TestUUpdateHostSNATRules(t *testing.T) {

pkg/networkutils/mocks/network_mocks.go:18

  • Generated mock header contains a duplicated "Source:" line. Regenerate the mock or remove the duplicate line to avoid confusing metadata in generated files.
// Code generated by MockGen. DO NOT EDIT.
// Source: github.com/aws/amazon-vpc-cni-k8s/pkg/networkutils (interfaces: NetworkAPIs)
// Source: github.com/aws/amazon-vpc-cni-k8s/pkg/networkutils (interfaces: NetworkAPIs)


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

semVer, err := semver.NewVersion(ver.String())
Expect(err).ToNot(HaveOccurred())
if semVer.Minor() <= 32 && mode == "nftables" {
return
return err
}
for _, pod := range pods.Items {
f.K8sResourceManagers.PodManager().DeleteAndWaitTillPodDeleted(&pod)
SetupENINetwork(eniIP string, eniMAC string, networkCard int, eniSubnetCIDR string, maxENIPerNIC int, isTrunkENI bool, routeTableID int, isRuleConfigured bool) error
// UpdateHostIptablesRules updates the nat table iptables rules on the host
UpdateHostIptablesRules(vpcCIDRs []string, primaryMAC string, primaryAddr *net.IP, v6Enabled bool) error
//UpdateHostIptablesRules(vpcCIDRs []string, primaryMAC string, primaryAddr *net.IP, v6Enabled bool) error
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

address this

}
err := n.updateHostIptablesRules(vpcCIDRs, primaryMAC, primaryAddr)
if err != nil {
log.Error(fmt.Sprintf("failed to iptable rules for snat %s", err.Error()))
Copy link
Copy Markdown
Contributor

@Pavani-Panakanti Pavani-Panakanti Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

address this. log.Errorf can be used directly


func (e *IptErrNotExists) Error() string {
// ref https://github.com/coreos/go-iptables/blob/main/iptables/iptables.go#L52
return "does not exists"
hasCtLoad := false
hasBitwise := false
hasMetaStore := false
markBytes := []byte{byte(mark), byte(mark >> 8), byte(mark >> 16), byte(mark >> 24)}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be addressed

return m == IptablesModeNFT
}

var iptablesVersionRegex = regexp.MustCompile(`v([0-9]+)\.([0-9]+)\.([0-9]+)(?:\s+\((\w+))?`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this will match (nftables - missing ). Any potential issues in future ? Should we include closing brace as well

return existing
}

func (c *nftConnmark) cleanupIptablesConnmarkRules() error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving this to iptables_connmark file ?

}

func (c *nftConnmark) ensureBaseChain(table *nftables.Table) *nftables.Chain {
existing := c.getBaseChain(table)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we not returning error from getBaseChain ? If chain does exist but listchain failed for some reason, we would treat that as not existing by retuning nil and add duplicate chain ?

}

func (c *nftConnmark) ensureConnmarkChain(table *nftables.Table) (*nftables.Chain, error) {
existing := c.getConnmarkChain(table)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. we should capture the error from getconnmarkchain and handle it ?

}

// it has two rules, one to match vethprefix and jump to connmark chain, second is to mark packet with conntrack mark
func (c *nftConnmark) ensureBaseChainRules(table *nftables.Table, baseChain, targetChain *nftables.Chain, vethPrefix string, mark uint32) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to pass vethprefix and mark to the func here ? these are already accessible via ntfConnmark struct right

}, nil
}

func (c *iptablesConnmark) Setup(exemptCIDRs []string) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about rollback scenario - cx moved to latest version which uses nftables, had to rollback and now CNI will use iptables but nftables are not cleaned up ?

}

if !hasFiblocalReturnRule {
c.addFibLocalReturnRule(table, baseChain)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if fib rule is only for ipvs mode should we add only incase of ipvs mode ? Probably no harm in adding always but want to evaluate once

})
}

func (c *nftConnmark) addRestoreRule(table *nftables.Table, chain *nftables.Chain, mark uint32) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have restore rule in iptables mangle same as existing code. Is this restore rule here redundant. Do we need it ?

mockNft.EXPECT().GetRules(table, baseChain).Return([]*nftables.Rule{}, nil)
mockNft.EXPECT().GetRules(table, connmarkChain).Return([]*nftables.Rule{}, nil)
mockNft.EXPECT().InsertRule(gomock.Any()).Return(&nftables.Rule{}).Times(3) // fib rule + 2 CIDRs
mockNft.EXPECT().AddRule(gomock.Any()).Return(&nftables.Rule{}).Times(3) // jump, restore, set mark
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we verify the rules as well along with count ?

"github.com/golang/mock/gomock"
)

func TestIptablesConnmarkSetup(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add test for cleanup func ? probably that would have caught the missing delete chain ?

@Pavani-Panakanti
Copy link
Copy Markdown
Contributor

Pavani-Panakanti commented Mar 18, 2026

@yash97 Can you add details on all test cases validated here or in a doc ?
what about integration tests for these ?
Can you also add godoc comments for all new files and functions

@Tycale
Copy link
Copy Markdown

Tycale commented Apr 8, 2026

Thanks for this PR 🙏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants