Skip to content

Pod-pod communication broken when CiliumEnvoyConfig is setup #1661

@sudeephb

Description

@sudeephb

We are facing an issue with on pod-pod communication between pods in the same node when CiliumEnvoyConfig is setup for the target service, and cilium egressGateway is enabled.

For example,
We have a deployment+service nginx with single nginx pod running. We have a cec attached to it(cec manifest in below). From a client pod, when running curl nginx , it times out with upstream request timeout if the client pod is in the same node as the nginx pod. If the client pod is in a different node, the request is successful.
Upon looking at hubble flows, we see the following when the request is successful(ie when the client pod is in a different node):

Nov 20 09:53:08.029: ns-xxxxxxx/client-77599f967b-n2nbn:56488 (ID:18170) -> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) policy-verdict:L3-Only INGRESS ALLOWED (TCP Flags: SYN)
Nov 20 09:53:08.029: ns-xxxxxxx/client-77599f967b-n2nbn:56488 (ID:18170) -> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) to-endpoint FORWARDED (TCP Flags: SYN)

The source pod is ns-xxxxxxx/client-77599f967b-n2nbn:56488 as correctly shown in the logs.

When the request is unsuccessful (ie when the client pod is in the same node), the hubble flows look like this:

Nov 20 09:52:11.616: 10.154.0.19:46532 (host) -> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) policy-verdict:all EGRESS ALLOWED (TCP Flags: SYN)
Nov 20 09:52:11.616: 10.154.0.19:46532 (ID:22760) -> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) policy-verdict:L3-Only INGRESS ALLOWED (TCP Flags: SYN)
Nov 20 09:52:11.616: 10.154.0.19:46532 (ID:22760) -> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) to-endpoint FORWARDED (TCP Flags: SYN)
Nov 20 09:52:11.616: 10.154.0.19:46532 (world) <> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) policy-verdict:none INGRESS DENIED (TCP Flags: SYN)
Nov 20 09:52:11.616: 10.154.0.19:46532 (world) <> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) Policy denied DROPPED (TCP Flags: SYN)
Nov 20 09:52:12.630: 10.154.0.19:46532 (world) <> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) policy-verdict:none INGRESS DENIED (TCP Flags: SYN)
Nov 20 09:52:12.630: 10.154.0.19:46532 (world) <> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) Policy denied DROPPED (TCP Flags: SYN)

Here, the source IP seems to be SNATed to be a different IP (10.154.0.19). This is the internal IP address of the node in which these pods are running.
Accessing the nginx pod directly via it's pod IP is successful, even from pods in the same node.
Cilium is deployed via helm version 1.17.6.
Removing the cec makes the requests from all the pods successful.
Additionally, we have the following fields set in our helm values:

bpf:
  masquerade: true
egressGateway:
  enabled: true

If we set both these to false, then the issue is solved, but we can't do it since we need egress gateway enabled.
This is a gke cluster.

The ciliumEnvoyConfig manifest looks like this:

apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
  name: nginx-cec
  namespace: ns-xxxxxxx
  resourceVersion: "1763632222440175012"
  uid: c2aa4a58-dd56-41c7-892a-f06d88b2cd3a
spec:
  backendServices:
  - name: nginx
    namespace: ns-xxxxxxx
    number:
    - "80"
  resources:
  - '@type': type.googleapis.com/envoy.config.listener.v3.Listener
    filter_chains:
    - filter_chain_match:
        destination_port: 80
      filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          '@type': type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              '@type': type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          rds:
            route_config_name: ns-xxxxxxx-nginx-route-80
          stat_prefix: ns-xxxxxxx-nginx-envoy-listener
    name: ns-xxxxxxx-nginx-envoy-listener
  - '@type': type.googleapis.com/envoy.config.route.v3.RouteConfiguration
    name: ns-xxxxxxx-nginx-route-80
    virtual_hosts:
    - domains:
      - '*'
      name: ns-xxxxxxx-nginx-route-80
      routes:
      - match:
          prefix: /
        route:
          cluster: ns-xxxxxxx/nginx:80
  - '@type': type.googleapis.com/envoy.config.cluster.v3.Cluster
    connect_timeout: 15s
    lb_policy: LEAST_REQUEST
    name: ns-xxxxxxx/nginx:80
    type: EDS
  services:
  - name: nginx
    namespace: ns-xxxxxxx

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions