Skip to content

Conversation

@shuqz
Copy link
Collaborator

@shuqz shuqz commented Dec 11, 2025

Issue

#4477

Description

For ALB, if cross_zone=disabled and AZ is either not specified or all, we get error from api

ValidationError: When cross-zone load balancing is disabled on this target group, you must specify an Availability Zone for IP target '172.16.0.108' since it is outside of the VPC

it only succeed when az is specified. this does not apply to NLB
Fix implemented:

  • introduced a cache: needsPodAZCache tracks TGBs that require pod availability zones during target registration.
  • catch error from registerTarget, if it is validationError, try register with pod AZ, if still fails, fail with error

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the docs directory)
  • Manually tested
    created peered VPC, trying to register pod from VPC 1 to target group in VPC2. Before fix, no target is registered. After fix, target is registered. tested with yaml (load_balancing.cross_zone.enabled=false)
---
apiVersion: v1
kind: Namespace
metadata:
  name: game-2048
---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: game-2048
  name: deployment-2048
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: app-2048
  replicas: 5
  template:
    metadata:
      labels:
        app.kubernetes.io/name: app-2048
    spec:
      containers:
        - image: public.ecr.aws/l6m2t8p7/docker-2048:latest
          imagePullPolicy: Always
          name: app-2048
          ports:
            - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  namespace: game-2048
  name: service-2048
spec:
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  type: NodePort
  selector:
    app.kubernetes.io/name: app-2048
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: game-2048
  name: ingress-2048
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/target-group-attributes: load_balancing.cross_zone.enabled=false
spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: service-2048
                port:
                  number: 80
---
apiVersion: elbv2.k8s.aws/v1beta1
kind: TargetGroupBinding
metadata:
  name: peered-tgb
  namespace: game-2048
spec:
  serviceRef:
    name: service-2048
    port: 80
  targetGroupARN: arn:aws:elasticloadbalancing:us-west-2::targetgroup/test-tg/
  vpcID: vpc- # The peered VPC ID (different from cluster VPC)
Screenshot 2025-12-12 at 5 41 23 PM Screenshot 2025-12-12 at 5 41 53 PM
  • Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shuqz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 11, 2025
return svc.Annotations, nil
}

func (m *defaultResourceManager) isALBIngress(svcAnnotations map[string]string) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not always work. Customers aren't required to add ingress annotations to their Services.

It's more reliable to use the TG protocol: https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/main/apis/elbv2/v1beta1/targetgroupbinding_types.go#L143-L146

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that this only works for the Ingress API. We will want a solution that works ALB Gateways too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought about TG protocol too, but did not realize they are not required to add annotation.
yea i forgot gateway api integrations, i will include it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return false
}

func (m *defaultResourceManager) isCrossZoneDisabled(svcAnnotations map[string]string, isALB bool) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Ingress objects, this could go on the Ingress or Service, meaning that this could be incorrect

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment, this will never work for ALB Gateways.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other more striaghtforward way is to use describe api get cross-zone value.

if !ok {
az, ok = node.Labels["failure-domain.beta.kubernetes.io/zone"]
}
if !ok {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will block reconciles on non-EKS clusters. I suspect that's not what we want.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can remove this fallback part

return false
}

func (m *defaultResourceManager) getPodAvailabilityZone(ctx context.Context, pod k8s.PodInfo) (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using a node-az cache? That way we don't have to recalculate the node AZ each reconcile loop iteration.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 13, 2025
var apiErr smithy.APIError
if errors.As(err, &apiErr) {
isMatch := apiErr.ErrorCode() == "ValidationError" &&
strings.Contains(apiErr.ErrorMessage(), "you must specify an Availability Zone")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure if they will change error message but checking ValidationError alone might not enough

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants