What happened:
When aws-node pod is recreated it will load and pin a new eBPF program with new maps, but the traffic control filters associated with the network interfaces will sometimes still reference the old program
How to reproduce it (as minimally and precisely as possible):
UPDATE: these steps actually aren't enough to reproduce the issue in all cases. We have some nodes where rerolling the aws-node pod does not cause it to pin a new eBPF program; however, the fact that this does happen on some nodes does still point to a bug in this repo.
- Create an EKS cluster
- Install vpc-cni addon with
enableNetworkPolicy: true
- Create Pod A that's just a hello world server.
- Create an Ingress Network Policy for Pod A that requires that other pods that want to connect to it have the label
test: test.
- Create Deployment A with the Pod label
test: test.
- Verify that you can
curl pod-a-ip:pod-a-port.
a. Use debug pods (or ssh/exec to the node/Pod A) to view the ebpf program id associated with the host-side of the veth pair with Pod A. Keep a reference of this id.
b. Also check that this id is the same one that's referenced by the pinned program at /sys/fs/bpf/globals/aws/programs/<replicaset-name>-<namespace>_handle_ingress
- Delete the aws-node Pod that runs on the same node as Pod A so it gets recreated.
- Delete the pod in Deployment A so it gets recreated with a new IP
- Verify that
curl pod-a-ip:pod-a-port times out.
a. Use debug pods to see that the pinned eBPF program at /sys/fs/bpf/globals/aws/programs/<pod-name>-<namespace>_handle_ingress has a different (new) ID while the old program ID still associated with the eBPF program tied to the network interface.
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version): v1.34.3-eks-ac2d5a0
- CNI Version: v1.21.1-eksbuild.3
- Network Policy Agent Version: v1.3.1-eksbuild.1
- OS (e.g:
cat /etc/os-release): Amazon Linux 2023
- Kernel (e.g.
uname -a): Linux 6.12.40-64.114.amzn2023.x86_64
What happened:
When aws-node pod is recreated it will load and pin a new eBPF program with new maps, but the traffic control filters associated with the network interfaces will sometimes still reference the old program
How to reproduce it (as minimally and precisely as possible):
UPDATE: these steps actually aren't enough to reproduce the issue in all cases. We have some nodes where rerolling the aws-node pod does not cause it to pin a new eBPF program; however, the fact that this does happen on some nodes does still point to a bug in this repo.
enableNetworkPolicy: truetest: test.test: test.curl pod-a-ip:pod-a-port.a. Use debug pods (or ssh/exec to the node/Pod A) to view the ebpf program id associated with the host-side of the veth pair with Pod A. Keep a reference of this id.
b. Also check that this id is the same one that's referenced by the pinned program at
/sys/fs/bpf/globals/aws/programs/<replicaset-name>-<namespace>_handle_ingresscurl pod-a-ip:pod-a-porttimes out.a. Use debug pods to see that the pinned eBPF program at
/sys/fs/bpf/globals/aws/programs/<pod-name>-<namespace>_handle_ingresshas a different (new) ID while the old program ID still associated with the eBPF program tied to the network interface.Anything else we need to know?:
Environment:
kubectl version): v1.34.3-eks-ac2d5a0cat /etc/os-release): Amazon Linux 2023uname -a): Linux 6.12.40-64.114.amzn2023.x86_64