Skip to content

bug: Egress rule with namespaceSelector: {} + podSelector.matchLabels silently drops traffic #572

Description

@brightsparc

What happened:

A NetworkPolicy Egress rule combining namespaceSelector: {} (all namespaces) with podSelector.matchLabels (label-based opt-in) is silently not enforced by aws-network-policy-agent. Traffic to pods that should be allowed by the rule gets dropped (tcp connect deadline elapsed). The same policy is enforced correctly by Azure NPM on AKS and Cilium on GKE Dataplane V2, with no modification.

Symptom from the source pod (Restate, calling a worker in another namespace via cluster DNS):

client error (Connect) caused by: tcp connect error caused by: deadline has elapsed

Connect from a non-labeled debug pod in the same source namespace to the same destination works fine, so the network path itself is not at fault — it's the agent's evaluation of this specific policy shape.

Attach logs:

What you expected to happen:

Per the K8s NetworkPolicy spec, namespaceSelector: {} matches all namespaces. When combined with podSelector in the same to/from list item, the two selectors AND together: "match any pod with these labels in any namespace." The destination pod has the required label, so the egress should be allowed.

How to reproduce it (as minimally and precisely as possible):

Two namespaces: restate (source) and intro-dp (destination).

Source pod labels (in restate ns):

labels:
  app.kubernetes.io/instance: restate
  app.kubernetes.io/name: restate

Destination pod labels (in intro-dp ns):

labels:
  allow.restate.dev/restate: "true"

Apply this NetworkPolicy in restate ns (combined with a deny-all Egress policy that also selects the source pod, which the upstream restatedev Restate Operator deploys by default):

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-restate-egress
  namespace: restate
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/instance: restate
      app.kubernetes.io/name: restate
  policyTypes: [Egress]
  egress:
  - to:
    - namespaceSelector: {}
      podSelector:
        matchLabels:
          allow.restate.dev/restate: "true"

From the source pod, attempt to connect to the destination pod's service or pod IP on any port. Connect times out.

Replace the rule with a simpler form and enforcement works correctly:

egress:
- to:
  - namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: intro-dp
  ports:
  - port: 8003
    protocol: TCP

The differences that distinguish the failing rule from the working rule:

  1. namespaceSelector: {} (empty / all namespaces) vs namespaceSelector.matchLabels (named namespace).
  2. No ports (allow any port) vs explicit ports.

Either or both may be involved in the agent's mis-evaluation.

Anything else we need to know?:

  • Reproduced on both aws-network-policy-agent v1.3.1-eksbuild.1 and v1.3.4-eksbuild.1 (latest at time of writing); bumping the addon and rolling the DaemonSet does not resolve.
  • The failing rule is created and enforced by the upstream restatedev Restate Operator. Identical configuration enforces correctly on AKS (Azure NPM) and GKE (Dataplane V2 / Cilium).
  • Workaround in place: a companion NetworkPolicy with the simpler shape (named namespace + explicit port). K8s NetworkPolicy rules are OR'd, so this adds an allow path without modifying the operator-managed policy.

Environment:

  • Kubernetes version (use kubectl version): v1.35.4-eks-4136f65
  • CNI Version: amazon-k8s-cni v1.21.1 (EKS addon vpc-cni v1.21.1-eksbuild.8)
  • Network Policy Agent Version: v1.3.4-eksbuild.1
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions