Skip to content

enable-endpoint-slices: IPv4 pod addresses registered into IPv6 target group for RequireDualStack services #4775

@Sergey-Kizimov

Description

@Sergey-Kizimov

Bug Description

When --enable-endpoint-slices=true (the default since v3.3.0), the controller attempts to register IPv4 pod addresses into IPv6-only target groups for services with ipFamilyPolicy: RequireDualStack and ipFamilies: [IPv6, IPv4]. AWS rejects these with:

ValidationError: The IP address '10.35.x.x' is not a valid IPv6 address

The affected pods' readiness gates (target-health.elbv2.k8s.aws/*) are never satisfied and the pods remain stuck in not-ready. Existing pods with stale pre-upgrade registrations continue serving traffic, masking the impact until the next pod reschedule.

Root Cause

computeServiceEndpointsData in pkg/backend/endpoint_resolver.go lists all EndpointSlices for a service without filtering by address type:

r.k8sClient.List(ctx, epSliceList,
    client.InNamespace(svcKey.Namespace),
    client.MatchingLabels{discovery.LabelServiceName: svcKey.Name})
// returns BOTH addressType=IPv4 and addressType=IPv6 slices

For a RequireDualStack [IPv6, IPv4] service, Kubernetes creates two EndpointSlices — one per address family. Both are merged into a single flat list and fed into resolvePodEndpointsWithEndpointsData. This produces two PodEndpoint entries per pod (one with the IPv4 address, one with the IPv6 address).

When matchPodEndpointWithTargets runs against a TGB with ipAddressType: ipv6, the IPv6 endpoints match existing targets (already registered) while the IPv4 endpoints appear as unmatched and are submitted to RegisterTargets. AWS rejects the call since the target group is IPv6-only.

The bug existed since EndpointSlice support was added (PR #2169, Sep 2021) but was dormant because --enable-endpoint-slices defaulted to false. PR #4353 (merged Sep 2025, shipped in v3.3.0) flipped the default to true, exposing the bug.

With --enable-endpoint-slices=false (the pre-v3.3.0 default), the legacy corev1.Endpoints path is used, which only exposes the primary IP family — for ipFamilies: [IPv6, IPv4] that is IPv6 — so the bug is never triggered.

Steps to Reproduce

  1. Dual-stack cluster where pods receive both IPv4 and IPv6 addresses
  2. Service with ipFamilyPolicy: RequireDualStack and ipFamilies: [IPv6, IPv4]
  3. Ingress with alb.ingress.kubernetes.io/ip-address-type: dualstack and target-type: ip
  4. Controller running with --enable-endpoint-slices=true (default in v3.3.0+)

Controller continuously logs:

Reconciler error  controller=targetGroupBinding
error="operation error Elastic Load Balancing v2: RegisterTargets,
  api error ValidationError: The IP address '10.x.x.x' is not a valid IPv6 address"

Expected Behavior

For a TGB with ipAddressType: ipv6, only IPv6 pod addresses should be submitted to RegisterTargets. The controller should not attempt to register IPv4 addresses into an IPv6-only target group.

Workaround

Set --enable-endpoint-slices=false to revert to the legacy Endpoints path. This restores the pre-v3.3.0 behaviour at the cost of not using EndpointSlices.

Environment

  • Controller version: v3.3.0 (also reproducible on v2.13.4 with --enable-endpoint-slices=true)
  • Not affected: SingleStack IPv6 services (only one EndpointSlice, no IPv4 addresses to submit)

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions