Fix SwitchLB connectivity issues#6531
Conversation
andriishestakov
commented
Mar 26, 2026
- Bug fixes
There was a problem hiding this comment.
Pull request overview
This PR aims to fix SwitchLB VIP/FIP connectivity issues by changing how SwitchLB VIPs answer ARP and by introducing additional OVN resources (LBs + hairpin SNAT option) so FIPs targeting SwitchLB VIPs can reach backends reliably and stay in sync as endpoints change.
Changes:
- Add OVN NB helpers to manage
LogicalSwitchPort.options["arp_proxy"]as an IP list and to setLoadBalancer.options["hairpin_snat_ip"]. - Update SwitchLB VIP handling to omit VIP IPs from the VIP LSP’s OVN
addressesand instead ARP-proxy the VIP IPs on the LRP-facing switch port. - Extend OvnFip controller logic to create per-FIP router/switch LBs for SwitchLB VIP targets and re-enqueue affected FIPs on SwitchLBRule / EndpointSlice changes.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/util/const.go | Adds a new annotation constant to persist FIP→internal-subnet info. |
| pkg/ovs/ovn-nb-logical_switch_port.go | Adds helpers to add/remove IPs from LSP arp_proxy option. |
| pkg/ovs/ovn-nb-load_balancer.go | Adds helper to set/unset LB hairpin_snat_ip option. |
| pkg/ovs/interface.go | Extends OVS interfaces to expose the new NB client helpers. |
| pkg/controller/vip.go | Changes SwitchLB VIP LSP creation and ARP behavior (move ARP responsibility to LRP switch port). |
| pkg/controller/switch_lb_rule.go | Re-enqueues FIPs when SwitchLBRule changes to keep FIP LBs in sync. |
| pkg/controller/ovn_fip.go | Implements SwitchLB VIP–targeted FIP LB programming + cleanup and adds FIP re-enqueue helpers. |
| pkg/controller/endpoint_slice.go | Re-enqueues FIPs on backend changes for SwitchLBRule services. |
| mocks/pkg/ovs/interface.go | Updates mocks to match new OVS interfaces. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pkg/controller/ovn_fip.go
Outdated
| // Switch LB: handles same-subnet pod → EIP:port (hairpin) with SNAT to gateway | ||
| switchLBName := fmt.Sprintf("fip-%s-%s-sw", fipName, proto) | ||
| if err = c.OVNNbClient.CreateLoadBalancer(switchLBName, proto); err != nil { | ||
| return fmt.Errorf("create switch LB %s: %w", switchLBName, err) | ||
| } | ||
| // Remove any stale VIPs (from a previous EIP IP) before adding the current one. | ||
| if existingLB, lbErr := c.OVNNbClient.GetLoadBalancer(switchLBName, true); lbErr == nil && existingLB != nil { | ||
| for staleVip := range existingLB.Vips { | ||
| if staleVip != eipVip { | ||
| _ = c.OVNNbClient.LoadBalancerDeleteVip(switchLBName, staleVip, true) | ||
| } | ||
| } | ||
| } | ||
| if err = c.OVNNbClient.LoadBalancerAddVip(switchLBName, eipVip, backends...); err != nil { | ||
| return fmt.Errorf("add vip to switch LB %s: %w", switchLBName, err) | ||
| } | ||
| if err = c.OVNNbClient.SetLoadBalancerHairpinSnatIP(switchLBName, v4Eip); err != nil { | ||
| return fmt.Errorf("set hairpin_snat_ip on switch LB %s: %w", switchLBName, err) | ||
| } |
There was a problem hiding this comment.
addFipVipLBRules takes subnetGateway but never uses it, and the inline comment says hairpin SNAT is "to gateway" while the code sets hairpin_snat_ip to v4Eip. This mismatch makes the intended behavior unclear and increases the risk of future regressions. Either remove the unused parameter and update the comment, or use subnetGateway (e.g., the subnet's v4 gateway IP) if that is the intended SNAT source.
| // AddLogicalSwitchPortArpProxyIP adds an IP to the arp_proxy option of a logical switch port. | ||
| // This is used to make the LRP switch port respond to ARP for the given IP with the LRP's MAC. | ||
| func (c *OVNNbClient) AddLogicalSwitchPortArpProxyIP(lspName, ip string) error { | ||
| lsp, err := c.GetLogicalSwitchPort(lspName, false) | ||
| if err != nil { | ||
| klog.Error(err) | ||
| return fmt.Errorf("get logical switch port %s: %w", lspName, err) | ||
| } | ||
| if lsp == nil { | ||
| err = fmt.Errorf("logical switch port %s not found", lspName) | ||
| klog.Error(err) | ||
| return err | ||
| } | ||
| if lsp.Options == nil { | ||
| lsp.Options = make(map[string]string) | ||
| } | ||
| existing := strings.Fields(lsp.Options["arp_proxy"]) | ||
| for _, existingIP := range existing { | ||
| if existingIP == ip { | ||
| return nil | ||
| } | ||
| } | ||
| existing = append(existing, ip) | ||
| lsp.Options["arp_proxy"] = strings.Join(existing, " ") | ||
| op, err := c.UpdateLogicalSwitchPortOp(lsp, &lsp.Options) | ||
| if err != nil { | ||
| klog.Error(err) | ||
| return err | ||
| } | ||
| if err := c.Transact("lsp-update", op); err != nil { | ||
| klog.Error(err) | ||
| return fmt.Errorf("failed to add arp_proxy ip %s to logical switch port %s: %w", ip, lspName, err) | ||
| } | ||
| return nil |
There was a problem hiding this comment.
AddLogicalSwitchPortArpProxyIP/RemoveLogicalSwitchPortArpProxyIP introduce new behavior around the arp_proxy option, but there are existing unit tests for SetLogicalSwitchPortArpProxy. Please add/extend ovn-nb-logical_switch_port_test.go coverage for adding/removing IPs (including idempotency and interaction with existing arp_proxy values) to prevent regressions.
| // SetLoadBalancerHairpinSnatIP sets options:hairpin_snat_ip on a load balancer. | ||
| // OVN SNATs the source IP to `ip` when a client on the same subnet as a backend | ||
| // hits the LB VIP, ensuring replies traverse the router so conntrack can reverse | ||
| // the DNAT. Passing ip="" removes the option. | ||
| func (c *OVNNbClient) SetLoadBalancerHairpinSnatIP(lbName, ip string) error { | ||
| lb, err := c.GetLoadBalancer(lbName, false) | ||
| if err != nil { | ||
| klog.Error(err) | ||
| return fmt.Errorf("get lb %s for hairpin_snat_ip: %w", lbName, err) | ||
| } | ||
| if lb.Options["hairpin_snat_ip"] == ip { | ||
| return nil | ||
| } | ||
| options := make(map[string]string, len(lb.Options)+1) | ||
| maps.Copy(options, lb.Options) | ||
| if ip == "" { | ||
| delete(options, "hairpin_snat_ip") | ||
| } else { | ||
| options["hairpin_snat_ip"] = ip | ||
| } | ||
| lb.Options = options | ||
| if err = c.UpdateLoadBalancer(lb, &lb.Options); err != nil { | ||
| klog.Error(err) | ||
| return fmt.Errorf("set hairpin_snat_ip on lb %s: %w", lbName, err) | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
SetLoadBalancerHairpinSnatIP is new option-mutating behavior, and pkg/ovs already has unit tests for similar helpers like SetLoadBalancerAffinityTimeout. Please add ovn-nb-load_balancer_test.go coverage for setting, updating, and removing hairpin_snat_ip (including preserving other lb.Options keys).
pkg/controller/ovn_fip.go
Outdated
| if vip.Spec.Type != util.SwitchLBRuleVip { | ||
| continue | ||
| } | ||
| if vip.Status.V4ip != vipIP && vip.Spec.V4ip != vipIP { |
There was a problem hiding this comment.
enqueueFipsForVipIP only matches the provided vipIP against vip.Status.V4ip and vip.Spec.V4ip. If SwitchLBRule.Spec.Vip (and thus the passed vipIP) is IPv6, this function will never enqueue any FIPs. Include v6 matching (vip.Status.V6ip / vip.Spec.V6ip) so IPv6 SwitchLBRule VIPs trigger re-sync correctly.
| if vip.Status.V4ip != vipIP && vip.Spec.V4ip != vipIP { | |
| if vip.Status.V4ip != vipIP && vip.Spec.V4ip != vipIP && | |
| vip.Status.V6ip != vipIP && vip.Spec.V6ip != vipIP { |
| if err = c.patchOvnFipSubnetAnnotation(key, subnetName); err != nil { | ||
| klog.Errorf("failed to patch subnet annotation for fip %s, %v", key, err) | ||
| return err | ||
| } |
There was a problem hiding this comment.
In the isSwitchLBVip path, only the IPv4 case (v4IP/v4Eip) calls addFipVipLBRules, and the regular NAT programming is entirely skipped. This means an IPv6-only (or v6) OvnFip targeting a switch_lb_vip VIP will not create any OVN connectivity rules. Either implement the equivalent v6 handling (LBs/NAT) or fall back to the existing NAT-based behavior for v6 so IPv6 FIPs keep working.
| } | |
| } | |
| // For IPv6 (and mixed v4/v6) in switch_lb_vip mode, fall back to the | |
| // standard NAT-based behavior so IPv6 FIPs keep working. | |
| var stateless bool | |
| if cachedFip.Spec.Type != "" { | |
| stateless = (cachedFip.Spec.Type == kubeovnv1.GWDistributedType) | |
| } else { | |
| stateless = (c.ExternalGatewayType == kubeovnv1.GWDistributedType) | |
| } | |
| options := map[string]string{"stateless": strconv.FormatBool(stateless)} | |
| // support v6:v6 | |
| if v6IP != "" && v6Eip != "" { | |
| if err = c.OVNNbClient.AddNat(vpcName, ovnnb.NATTypeDNATAndSNAT, v6Eip, v6IP, mac, cachedFip.Spec.IPName, options); err != nil { | |
| klog.Errorf("failed to create v6:v6 fip, %v", err) | |
| return err | |
| } | |
| } | |
| // support v4:v6 | |
| if v4IP != "" && v6IP == "" && v4Eip == "" && v6Eip != "" { | |
| if err = c.OVNNbClient.AddNat(vpcName, ovnnb.NATTypeDNATAndSNAT, v6Eip, v4IP, mac, cachedFip.Spec.IPName, options); err != nil { | |
| klog.Errorf("failed to create v4:v6 fip, %v", err) | |
| return err | |
| } | |
| } | |
| // support v6:v4 | |
| if v6IP != "" && v4IP == "" && v6Eip == "" && v4Eip != "" { | |
| if err = c.OVNNbClient.AddNat(vpcName, ovnnb.NATTypeDNATAndSNAT, v4Eip, v6IP, mac, cachedFip.Spec.IPName, options); err != nil { | |
| klog.Errorf("failed to create v6:v4 fip, %v", err) | |
| return err | |
| } | |
| } |
pkg/controller/switch_lb_rule.go
Outdated
| } | ||
|
|
||
| // Re-enqueue any OvnFips targeting this VIP so their LB backends stay in sync. | ||
| c.enqueueFipsForVipIP(slr.Name) |
There was a problem hiding this comment.
enqueueFipsForVipIP expects a VIP IP address, but this call passes slr.Name (the SwitchLBRule resource name). This will prevent OvnFips targeting the SwitchLBRule VIP from being re-synced when the rule updates. Pass slr.Spec.Vip (or the VIP IP string you validated earlier) instead.
| c.enqueueFipsForVipIP(slr.Name) | |
| c.enqueueFipsForVipIP(slr.Spec.Vip) |
pkg/controller/vip.go
Outdated
| // switch port (subnet-vpc), OVN generates a priority=30 ARP flow responding with the | ||
| // LRP's MAC. Pods then send dl_dst=LRP_MAC, and after DNAT the L2 lookup correctly | ||
| // routes packets through the LRP to the backend. | ||
| mac = util.GenerateMac() |
There was a problem hiding this comment.
For SwitchLBRuleVip, mac is allocated by IPAM (acquireIPAddress/acquireStaticIPAddress) but then overwritten with util.GenerateMac(). This desynchronizes VIP.Status.Mac from the MAC stored in IPAM (see init.go rehydrating IPAM from VIP CR), which can lead to MAC reuse/collisions after restart. Use the MAC returned by IPAM, or if you must generate a MAC here, pass it into the IPAM allocation so both sources stay consistent.
| mac = util.GenerateMac() |
| if cachedFip.Spec.IPType == util.Vip && cachedFip.Spec.IPName != "" { | ||
| if vip, err := c.virtualIpsLister.Get(cachedFip.Spec.IPName); err == nil { | ||
| isSwitchLBVip = (vip.Spec.Type == util.SwitchLBRuleVip) | ||
| } | ||
| } | ||
| if !isSwitchLBVip && cachedFip.Annotations[util.FipInternalSubnetAnnotation] != "" { | ||
| isSwitchLBVip = true | ||
| } | ||
|
|
||
| if isSwitchLBVip { | ||
| subnetName := cachedFip.Annotations[util.FipInternalSubnetAnnotation] |
There was a problem hiding this comment.
cleanOvnFipOvnResources determines isSwitchLBVip by looking up the VIP, but then always uses the stored annotation to derive subnetName. If the VIP exists but the annotation hasn't been patched yet (or patch failed), subnetName will be empty and delFipVipLBRules will skip detaching switch LBs from the subnet, which can cause LB deletion/cleanup to fail or leak. When the VIP lookup succeeds, use vip.Spec.Subnet as a fallback when the annotation is missing.
| if cachedFip.Spec.IPType == util.Vip && cachedFip.Spec.IPName != "" { | |
| if vip, err := c.virtualIpsLister.Get(cachedFip.Spec.IPName); err == nil { | |
| isSwitchLBVip = (vip.Spec.Type == util.SwitchLBRuleVip) | |
| } | |
| } | |
| if !isSwitchLBVip && cachedFip.Annotations[util.FipInternalSubnetAnnotation] != "" { | |
| isSwitchLBVip = true | |
| } | |
| if isSwitchLBVip { | |
| subnetName := cachedFip.Annotations[util.FipInternalSubnetAnnotation] | |
| subnetName := "" | |
| if cachedFip.Spec.IPType == util.Vip && cachedFip.Spec.IPName != "" { | |
| if vip, err := c.virtualIpsLister.Get(cachedFip.Spec.IPName); err == nil { | |
| if vip.Spec.Type == util.SwitchLBRuleVip { | |
| isSwitchLBVip = true | |
| // Use the VIP's subnet as a fallback if the annotation is missing. | |
| subnetName = vip.Spec.Subnet | |
| } | |
| } | |
| } | |
| if !isSwitchLBVip && cachedFip.Annotations[util.FipInternalSubnetAnnotation] != "" { | |
| isSwitchLBVip = true | |
| subnetName = cachedFip.Annotations[util.FipInternalSubnetAnnotation] | |
| } | |
| if isSwitchLBVip { |
|
@andriishestakov please sign off the commit and add the pr description. |
OVN's ls_in_pre_lb (priority 110) bypasses switch load-balancers for traffic arriving from a logical router port, so a regular dnat_and_snat NAT rule that rewrites the external EIP to a switch_lb_vip VIP results in the VIP's switch LB never firing for external/node-originated traffic. Fix by creating dedicated per-FIP load-balancers instead of NAT rules: Router LB (fip-<name>-<proto>): attached to the VPC router, handles external and node→EIP:port traffic by directly load-balancing to pod backends. This fires at LR priority 120, before any NAT rule (100). Switch LB (fip-<name>-<proto>-sw): attached to the VIP's subnet switch, handles same-subnet pod→EIP:port (hairpin) traffic with options:hairpin_snat_ip set to the EIP so reply traffic traverses the router and conntrack unwinds the DNAT correctly. An ARP-proxy dnat_and_snat NAT rule (no logical_port) is kept on the VPC router so the external LRP responds to ARP for the EIP. Additional changes: - cleanOvnFipOvnResources(): shared helper for deletion in both the update (finalizer removal) and delete handlers; detects switch_lb_vip FIPs via VIP type or the stored FipInternalSubnetAnnotation fallback so cleanup works even after the VIP CR is gone. - enqueueFipsForVipIP(): re-queues OvnFips whose VIP IP matches (v4 or v6) when SwitchLBRule endpoints change; callers now correctly pass slr.Spec.Vip (the IP) instead of slr.Name. - vip.go: switch_lb_vip LSP is created without IP in OVN addresses and uses the IPAM-allocated MAC (not a freshly generated one) to keep IPAM and OVN in sync. VIP IP is instead added to the LRP switch port's arp_proxy option so OVN generates priority-30 ARP flows using the LRP MAC, ensuring post-DNAT L2 lookup routes packets through the LRP. - AddLogicalSwitchPortArpProxyIP / RemoveLogicalSwitchPortArpProxyIP: new OVN NB helpers for managing the arp_proxy IP list on an LSP. - SetLoadBalancerHairpinSnatIP: new OVN NB helper to set/clear options:hairpin_snat_ip on a load balancer while preserving other keys. Signed-off-by: Andrii Shestakov <andrii.shestakov@gcore.com>
ce803bd to
b46e896
Compare
|
I'd love to review this PR but it doesn't say what it does or what it is trying to fix. |
|
@andriishestakov Hi, Thanks for the PR! but i have some opinions! How about extending a new router LB and switch LB CRD for the LB things? Using OVN FIP is not a clean way. |