Skip to content

When aws-node pod is recreated it will load and pin a new eBPF program with new maps, but the tc filters will sometimes still reference the old program #514

Description

@bob-bins

What happened:
When aws-node pod is recreated it will load and pin a new eBPF program with new maps, but the traffic control filters associated with the network interfaces will sometimes still reference the old program

How to reproduce it (as minimally and precisely as possible):

UPDATE: these steps actually aren't enough to reproduce the issue in all cases. We have some nodes where rerolling the aws-node pod does not cause it to pin a new eBPF program; however, the fact that this does happen on some nodes does still point to a bug in this repo.

  1. Create an EKS cluster
  2. Install vpc-cni addon with enableNetworkPolicy: true
  3. Create Pod A that's just a hello world server.
  4. Create an Ingress Network Policy for Pod A that requires that other pods that want to connect to it have the label test: test.
  5. Create Deployment A with the Pod label test: test.
  6. Verify that you can curl pod-a-ip:pod-a-port.
    a. Use debug pods (or ssh/exec to the node/Pod A) to view the ebpf program id associated with the host-side of the veth pair with Pod A. Keep a reference of this id.
    b. Also check that this id is the same one that's referenced by the pinned program at /sys/fs/bpf/globals/aws/programs/<replicaset-name>-<namespace>_handle_ingress
  7. Delete the aws-node Pod that runs on the same node as Pod A so it gets recreated.
  8. Delete the pod in Deployment A so it gets recreated with a new IP
  9. Verify that curl pod-a-ip:pod-a-port times out.
    a. Use debug pods to see that the pinned eBPF program at /sys/fs/bpf/globals/aws/programs/<pod-name>-<namespace>_handle_ingress has a different (new) ID while the old program ID still associated with the eBPF program tied to the network interface.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.34.3-eks-ac2d5a0
  • CNI Version: v1.21.1-eksbuild.3
  • Network Policy Agent Version: v1.3.1-eksbuild.1
  • OS (e.g: cat /etc/os-release): Amazon Linux 2023
  • Kernel (e.g. uname -a): Linux 6.12.40-64.114.amzn2023.x86_64

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions