Add prerouting connmark rules through nftable instead of iptables.#3588
Add prerouting connmark rules through nftable instead of iptables.#3588yash97 wants to merge 9 commits intoaws:masterfrom
Conversation
6144664 to
8ca4ea5
Compare
1ce4a25 to
c14e4f2
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses kube-proxy nftables mode compatibility by moving VPC CNI prerouting connmark logic into a dedicated connmark abstraction that can be implemented via either iptables-legacy or native nftables, selected automatically based on the node’s iptables backend.
Changes:
- Introduces a
Connmarkinterface and adds nftables- and iptables-based implementations for connmark rule setup/cleanup. - Updates host networking flow to program SNAT rules separately and delegate connmark rule management to the selected backend.
- Adds integration/unit tests and supporting framework changes (including multi-container pod exec and kube-proxy mode SNAT validation).
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| test/integration/eni-subnet-discovery/eni_subnet_discovery_secondary_exclusion_test.go | Switches to new PodExecInContainer helper for multi-container exec. |
| test/integration/cni/snat_kube_proxy_test.go | New integration test validating SNAT/connmark behavior across kube-proxy modes. |
| test/framework/resources/k8s/resources/pod.go | Renames/standardizes pod exec API to PodExecInContainer. |
| test/framework/framework.go | Adds a DiscoveryClient used by integration tests. |
| scripts/run-mixed-os-snat-kube-proxy-test.sh | New script intended to provision a mixed-OS EKS cluster and run SNAT/kube-proxy tests. |
| pkg/nft/nft.go | Adds an nftables client wrapper interface for easier testing/mocking. |
| pkg/nft/mocks/nft_mocks.go | Adds gomock for the nft client wrapper. |
| pkg/networkutils/nftables_connmark.go | Implements nftables backend for connmark rules and legacy iptables cleanup-on-migration. |
| pkg/networkutils/nftables_connmark_test.go | Unit tests for nftables connmark reconciliation logic. |
| pkg/networkutils/iptables_connmark.go | Implements iptables backend for connmark rules. |
| pkg/networkutils/iptables_connmark_test.go | Unit tests for iptables connmark behavior and stale rule cleanup. |
| pkg/networkutils/network.go | Refactors host SNAT programming and integrates Connmark into host setup. |
| pkg/networkutils/network_test.go | Updates unit tests to account for new connmark abstraction and SNAT update flow. |
| pkg/networkutils/mocks/network_mocks.go | Updates generated mocks for renamed SNAT update API. |
| pkg/networkutils/mocks/connmark_mocks.go | Adds gomock for Connmark. |
| pkg/iptableswrapper/iptables.go | Adds GetIptablesMode() detection based on iptables --version. |
| pkg/iptableswrapper/mocks/iptables_maps.go | Enhances mock iptables delete behavior to support IsNotExist semantics. |
| pkg/ipamd/ipamd.go | Switches CIDR-change updates to call UpdateHostSNATRules. |
| go.mod / go.sum | Adds nftables dependency and related indirect deps. |
| .github/workflows/pr-automated-tests.yaml | Adjusts Docker build jobs to run per-arch via a matrix. |
Comments suppressed due to low confidence (2)
pkg/networkutils/network_test.go:851
- Test name looks like it has a typo: "TestUUpdateHostSNATRules". Renaming to something consistent (e.g., "TestUpdateHostSNATRules") will improve readability and discoverability in test output.
func TestUUpdateHostSNATRules(t *testing.T) {
pkg/networkutils/mocks/network_mocks.go:18
- Generated mock header contains a duplicated "Source:" line. Regenerate the mock or remove the duplicate line to avoid confusing metadata in generated files.
// Code generated by MockGen. DO NOT EDIT.
// Source: github.com/aws/amazon-vpc-cni-k8s/pkg/networkutils (interfaces: NetworkAPIs)
// Source: github.com/aws/amazon-vpc-cni-k8s/pkg/networkutils (interfaces: NetworkAPIs)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…wrapper and Connmark interface for testability
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 20 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| semVer, err := semver.NewVersion(ver.String()) | ||
| Expect(err).ToNot(HaveOccurred()) | ||
| if semVer.Minor() <= 32 && mode == "nftables" { | ||
| return |
| return err | ||
| } | ||
| for _, pod := range pods.Items { | ||
| f.K8sResourceManagers.PodManager().DeleteAndWaitTillPodDeleted(&pod) |
| SetupENINetwork(eniIP string, eniMAC string, networkCard int, eniSubnetCIDR string, maxENIPerNIC int, isTrunkENI bool, routeTableID int, isRuleConfigured bool) error | ||
| // UpdateHostIptablesRules updates the nat table iptables rules on the host | ||
| UpdateHostIptablesRules(vpcCIDRs []string, primaryMAC string, primaryAddr *net.IP, v6Enabled bool) error | ||
| //UpdateHostIptablesRules(vpcCIDRs []string, primaryMAC string, primaryAddr *net.IP, v6Enabled bool) error |
| } | ||
| err := n.updateHostIptablesRules(vpcCIDRs, primaryMAC, primaryAddr) | ||
| if err != nil { | ||
| log.Error(fmt.Sprintf("failed to iptable rules for snat %s", err.Error())) |
There was a problem hiding this comment.
address this. log.Errorf can be used directly
|
|
||
| func (e *IptErrNotExists) Error() string { | ||
| // ref https://github.com/coreos/go-iptables/blob/main/iptables/iptables.go#L52 | ||
| return "does not exists" |
| hasCtLoad := false | ||
| hasBitwise := false | ||
| hasMetaStore := false | ||
| markBytes := []byte{byte(mark), byte(mark >> 8), byte(mark >> 16), byte(mark >> 24)} |
There was a problem hiding this comment.
this should be addressed
| return m == IptablesModeNFT | ||
| } | ||
|
|
||
| var iptablesVersionRegex = regexp.MustCompile(`v([0-9]+)\.([0-9]+)\.([0-9]+)(?:\s+\((\w+))?`) |
There was a problem hiding this comment.
Looks like this will match (nftables - missing ). Any potential issues in future ? Should we include closing brace as well
| return existing | ||
| } | ||
|
|
||
| func (c *nftConnmark) cleanupIptablesConnmarkRules() error { |
There was a problem hiding this comment.
How about moving this to iptables_connmark file ?
| } | ||
|
|
||
| func (c *nftConnmark) ensureBaseChain(table *nftables.Table) *nftables.Chain { | ||
| existing := c.getBaseChain(table) |
There was a problem hiding this comment.
why are we not returning error from getBaseChain ? If chain does exist but listchain failed for some reason, we would treat that as not existing by retuning nil and add duplicate chain ?
| } | ||
|
|
||
| func (c *nftConnmark) ensureConnmarkChain(table *nftables.Table) (*nftables.Chain, error) { | ||
| existing := c.getConnmarkChain(table) |
There was a problem hiding this comment.
same here. we should capture the error from getconnmarkchain and handle it ?
| } | ||
|
|
||
| // it has two rules, one to match vethprefix and jump to connmark chain, second is to mark packet with conntrack mark | ||
| func (c *nftConnmark) ensureBaseChainRules(table *nftables.Table, baseChain, targetChain *nftables.Chain, vethPrefix string, mark uint32) error { |
There was a problem hiding this comment.
why do we need to pass vethprefix and mark to the func here ? these are already accessible via ntfConnmark struct right
| }, nil | ||
| } | ||
|
|
||
| func (c *iptablesConnmark) Setup(exemptCIDRs []string) error { |
There was a problem hiding this comment.
what about rollback scenario - cx moved to latest version which uses nftables, had to rollback and now CNI will use iptables but nftables are not cleaned up ?
| } | ||
|
|
||
| if !hasFiblocalReturnRule { | ||
| c.addFibLocalReturnRule(table, baseChain) |
There was a problem hiding this comment.
if fib rule is only for ipvs mode should we add only incase of ipvs mode ? Probably no harm in adding always but want to evaluate once
| }) | ||
| } | ||
|
|
||
| func (c *nftConnmark) addRestoreRule(table *nftables.Table, chain *nftables.Chain, mark uint32) { |
There was a problem hiding this comment.
We have restore rule in iptables mangle same as existing code. Is this restore rule here redundant. Do we need it ?
| mockNft.EXPECT().GetRules(table, baseChain).Return([]*nftables.Rule{}, nil) | ||
| mockNft.EXPECT().GetRules(table, connmarkChain).Return([]*nftables.Rule{}, nil) | ||
| mockNft.EXPECT().InsertRule(gomock.Any()).Return(&nftables.Rule{}).Times(3) // fib rule + 2 CIDRs | ||
| mockNft.EXPECT().AddRule(gomock.Any()).Return(&nftables.Rule{}).Times(3) // jump, restore, set mark |
There was a problem hiding this comment.
can we verify the rules as well along with count ?
| "github.com/golang/mock/gomock" | ||
| ) | ||
|
|
||
| func TestIptablesConnmarkSetup(t *testing.T) { |
There was a problem hiding this comment.
Can you also add test for cleanup func ? probably that would have caught the missing delete chain ?
|
@yash97 Can you add details on all test cases validated here or in a doc ? |
|
Thanks for this PR 🙏🏻 |
Solves #3563
What type of PR is this?
improvement
Which issue does this PR fix?:
#3563
What does this PR do / Why do we need it?:
Summary
This PR introduces a Connmark interface that abstracts connmark rule management, with implementations for both iptables-legacy and nftables backends. The VPC CNI now auto-detects the node's iptables mode and uses the appropriate backend.
Changes
Core Implementation:
Connmark interface with Setup() and Cleanup() methods
nftConnmark for nftables backend using new nft.Client wrapper
iptablesConnmark for iptables-legacy backend
GetIptablesMode() to auto-detect iptables backend (legacy vs nf_tables)
fib daddr type local return rule to skip connmark processing for local traffic (fixes IPVS compatibility)
nftables Implementation:
Create aws-cni table with nat-prerouting and snat-mark chains
Idempotent rule reconciliation - adds missing rules, removes stale CIDR rules
One-time cleanup of legacy iptables connmark rules when migrating to nftables
Testing:
Integration test snat_kube_proxy_test.go to verify SNAT works with all kube-proxy modes (iptables, nftables, ipvs)
PodExecInContainer() to test framework for multi-container pod exec
Test script run-mixed-os-snat-kube-proxy-test.sh for mixed OS testing
Testing done on this change:
Sample nft list rules, how does it look on host through nft command
Will this PR introduce any new dependencies?:
Will this break upgrades or downgrades? Has updating a running cluster been tested?:
It will not break pod networking however, it will leave nft rules on host. But they will apply same operations applied by iptables-nft. As nft rules will only be executed if iptables-nft is used by cni and kube-proxy.
Does this change require updates to the CNI daemonset config files to work?:
Does this PR introduce any user-facing change?:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.