-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Labels
area/bpfeBPF Dataplane issueseBPF Dataplane issues
Description
Expected Behavior
When a Pod appears on a node that previously did not advertise Local endpoint traffic, the eBPF dataplane should finish updating its maps before the BGP dataplane announces new Local endpoint routes.
No traffic should be dropped during this transition.
Current Behavior
With externalTrafficPolicy: Local, a race condition occurs between the BGP and eBPF control-plane:
- Both dataplane consume the same EndpointSlice events.
- When a Pod lands on a node that previously had no Local endpoints, sometimes the BGP dataplane updates faster than the eBPF dataplane.
- If BGP wins the race, the node starts receiving traffic before eBPF maps are programmed:
2025-12-09 16:34:18.690 [DEBUG][417] confd/routes.go 526: Advertising local service svc="default/nginx-lb"
2025-12-09 16:34:18.690 [DEBUG][417] confd/routes.go 232: Checking routes for service advertise=true svc="default/nginx-lb"
2025-12-09 16:34:18.690 [DEBUG][417] confd/routes.go 316: Setting routes for key key="default/nginx-lb" routes=[]string{"fd00::247b/128", "10.0.0.81/32", "203.0.113.81/32", "2001:db8:20::1/128"}
2025-12-09 16:34:18.695 [DEBUG][417] confd/resource.go 328: Running reloadcmd: sv hup bird || true
2025-12-09 16:34:18.695 [DEBUG][417] confd/resource.go 236: Comparing candidate config to /etc/calico/confd/config/bird6_aggr.cfg
2025-12-09 16:34:18.695 [DEBUG][417] confd/util.go 66: /etc/calico/confd/config/bird6_aggr.cfg has md5sum ad7978d4eabf016bcb3bb96386cbebbd should be 8649bcc2d1dc8ef150fa7afb14a0e79e
2025-12-09 16:34:18.695 [DEBUG][417] confd/resource.go 246: Target config /etc/calico/confd/config/bird6_aggr.cfg out of sync
2025-12-09 16:34:18.695 [DEBUG][417] confd/resource.go 255: Overwriting target config /etc/calico/confd/config/bird6_aggr.cfg
2025-12-09 16:34:18.695 [DEBUG][417] confd/resource.go 328: Running reloadcmd: sv hup bird6 || true
bird: Reconfiguration requested by SIGHUP
bird: Reconfiguring
bird: device1: Reconfigured
bird: direct1: Reconfigured
bird: Global_fdc9_9723_9bc__1_port-179: Reconfigured
bird: Reconfigured
bird: device1: Reconfigured
bird: direct1: Reconfigured
bird: Global_169_254_0_179_port_179: Reconfigured
bird: Reloading protocol Global_169_254_0_179_port_179
bird: Global_169_254_0_179_port_179: State changed to feed
bird: Reconfigured
bird: Global_169_254_0_179_port_179: State changed to up
bird: Reconfiguration requested by SIGHUP
bird: Reconfiguring
bird: device1: Reconfigured
bird: direct1: Reconfigured
bird: Global_169_254_0_179_port_179: Reconfigured
bird: Reconfigured
2025-12-09 16:34:18.720 [INFO][14693] felix/syncer.go 579: Applying new state, 7 service
2025-12-09 16:34:18.720 [DEBUG][14693] felix/syncer.go 580: Applying new state, {map[default/kubernetes:https:fd00::1:443/TCP default/nginx-lb:default:fd00::247b:80/TCP kube-system/calico-typha:calico-typha:fd00::449d:5473/TCP kube-system/kube-dns:dns:fd00::a:53/UDP kube-system/kube-dns:dns-tcp:fd00::a:53/TCP kube-system/kube-dns:metrics:fd00::a:9153/TCP kube-system/metrics-server:https:fd00::b3f9:443/TCP] map[default/kubernetes:https:[[fd00::100]:6443] default/nginx-lb:default:[[fd00::8616]:80 [fd00::4c2]:80 [fd00::4c5]:80] kube-system/calico-typha:calico-typha:[[fd00::100]:5473] kube-system/kube-dns:dns:[[fd00::4bd]:53] kube-system/kube-dns:dns-tcp:[[fd00::4bd]:53] kube-system/kube-dns:metrics:[[fd00::4bd]:9153] kube-system/metrics-server:https:[[fd00::4ba]:10250]] }- This results in traffic drops until eBPF dataplane becomes ready:
Code: 200; Error: ; Total: 0.007852
Code: 200; Error: ; Total: 0.008327
Code: 200; Error: ; Total: 0.007627
Code: 200; Error: ; Total: 0.009729
Code: 200; Error: ; Total: 0.007383
Code: 200; Error: ; Total: 0.008806
Code: 200; Error: ; Total: 0.007610
Code: 200; Error: ; Total: 0.007809
Code: 200; Error: ; Total: 0.009194
Code: 200; Error: ; Total: 0.018675
Code: 200; Error: ; Total: 0.009060
Code: 200; Error: ; Total: 0.018818
Code: 200; Error: ; Total: 0.008734
Code: 200; Error: ; Total: 0.010610
Code: 200; Error: ; Total: 0.007807
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 4 ms: Couldn't connect to server; Total: 0.004651
Code: 200; Error: ; Total: 0.008174
Code: 200; Error: ; Total: 0.008258
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 4 ms: Couldn't connect to server; Total: 0.004102
Code: 200; Error: ; Total: 0.007708
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 3 ms: Couldn't connect to server; Total: 0.003840
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 4 ms: Couldn't connect to server; Total: 0.004699
Code: 200; Error: ; Total: 0.008038
Code: 200; Error: ; Total: 0.008784
Code: 200; Error: ; Total: 0.009779
Code: 200; Error: ; Total: 0.008756
Code: 200; Error: ; Total: 0.008726
Code: 200; Error: ; Total: 0.008342
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 4 ms: Couldn't connect to server; Total: 0.004112
Code: 200; Error: ; Total: 0.009328
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 4 ms: Couldn't connect to server; Total: 0.004055
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 3 ms: Couldn't connect to server; Total: 0.003740
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 3 ms: Couldn't connect to server; Total: 0.003813
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 4 ms: Couldn't connect to server; Total: 0.004160
Code: 200; Error: ; Total: 0.010584
Code: 200; Error: ; Total: 0.008330
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 4 ms: Couldn't connect to server; Total: 0.004703
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 4 ms: Couldn't connect to server; Total: 0.004516
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 4 ms: Couldn't connect to server; Total: 0.004160
Code: 200; Error: ; Total: 0.008560
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 4 ms: Couldn't connect to server; Total: 0.004640
Code: 200; Error: ; Total: 0.009740
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 4 ms: Couldn't connect to server; Total: 0.004602
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 4 ms: Couldn't connect to server; Total: 0.004061
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 4 ms: Couldn't connect to server; Total: 0.004036
Code: 000; Error: Failed to connect to 2001:db8:20::1 port 80 after 4 ms: Couldn't connect to server; Total: 0.004306
Code: 200; Error: ; Total: 0.014668
Code: 200; Error: ; Total: 0.009248
Code: 200; Error: ; Total: 0.010148
Code: 200; Error: ; Total: 0.008342
Code: 200; Error: ; Total: 0.009301
Code: 200; Error: ; Total: 0.012573
Code: 200; Error: ; Total: 0.008137
Code: 200; Error: ; Total: 0.009122
Code: 200; Error: ; Total: 0.007020
Code: 200; Error: ; Total: 0.007819
Code: 200; Error: ; Total: 0.007426
Code: 200; Error: ; Total: 0.006166Additionally, there is no endpoint ready check before announcing or withdrawing Local endpoint routes.
Such a check is needed when a Pod is removed from a node and no other Ready endpoints remain, to ensure the Local endpoint route is withdrawn not too late.
Possible Solution
- Use Cluster policy instead of Local. Routing becomes mostly static and avoids the race, but introduces an extra hop. Mitigating option - implement Kubernetes traffic-distribution PreferSameNode at the eBPF level, allowing Calico to keep stable cluster-wide routing while still preferring same-node traffic. This reduces dependence on strict Local endpoint routing semantics.
- Dirty hack - insert an artificial sleep before BGP advertises Local routes. This gives eBPF dataplane time to catch up, but is unreliable, non-deterministic.
- Add an internal method for BGP dataplane to check eBPF readiness.
Provide a simple internal function in Calico that indicates whether the eBPF dataplane on a node has finished updating for the relevant EndpointSlice.
Since route advertisement is decided per-node anyway, BGP dataplane could call this method and delay the Local route announcement until eBPF is ready - cleanly avoiding the race and add ready check before before announcing or withdrawing Local endpoint routes.
Steps to Reproduce (for bugs)
- Create a cluster with two nodes.
- Create a LoadBalancer Service with externalTrafficPolicy: Local.
apiVersion: v1
kind: Service
metadata:
name: nginx-lb
namespace: default
annotations:
"projectcalico.org/loadBalancerIPs": '["2001:db8:20::1","203.0.113.81"]'
spec:
type: LoadBalancer
externalTrafficPolicy: Local
ipFamilies: [IPv6, IPv4]
selector:
app: nginx
ports:
- port: 80- Deploy an application as a Deployment with 1 replica.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:stable- Start a curl for sending requets to the Service.
while true; do curl -s -o /dev/null -w "Code: %{http_code}; Error: %{errormsg}; Total: %{time_total}\n" http://[2001:db8:20::1]; sleep 0.01; done- Scale the Deployment to 2 replicas so that the second Pod lands on the second node.
- Observe that traffic begins arriving on the new node before its eBPF dataplane is ready.
- Traffic drops occur during this interval until eBPF finishes updating.
Context
Your Environment
- Calico version:
v3.31.2 - Calico dataplane (bpf, nftables, iptables, windows etc.):
bpf - Orchestrator version (e.g. kubernetes, openshift, etc.):
k8s v1.33.6 - Operating System and version:
Ubuntu 24.04.4 - Link to your project (optional):
Metadata
Metadata
Assignees
Labels
area/bpfeBPF Dataplane issueseBPF Dataplane issues