-
-
Notifications
You must be signed in to change notification settings - Fork 75
Description
We are facing an issue with on pod-pod communication between pods in the same node when CiliumEnvoyConfig is setup for the target service, and cilium egressGateway is enabled.
For example,
We have a deployment+service nginx with single nginx pod running. We have a cec attached to it(cec manifest in below). From a client pod, when running curl nginx , it times out with upstream request timeout if the client pod is in the same node as the nginx pod. If the client pod is in a different node, the request is successful.
Upon looking at hubble flows, we see the following when the request is successful(ie when the client pod is in a different node):
Nov 20 09:53:08.029: ns-xxxxxxx/client-77599f967b-n2nbn:56488 (ID:18170) -> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) policy-verdict:L3-Only INGRESS ALLOWED (TCP Flags: SYN)
Nov 20 09:53:08.029: ns-xxxxxxx/client-77599f967b-n2nbn:56488 (ID:18170) -> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) to-endpoint FORWARDED (TCP Flags: SYN)
The source pod is ns-xxxxxxx/client-77599f967b-n2nbn:56488 as correctly shown in the logs.
When the request is unsuccessful (ie when the client pod is in the same node), the hubble flows look like this:
Nov 20 09:52:11.616: 10.154.0.19:46532 (host) -> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) policy-verdict:all EGRESS ALLOWED (TCP Flags: SYN)
Nov 20 09:52:11.616: 10.154.0.19:46532 (ID:22760) -> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) policy-verdict:L3-Only INGRESS ALLOWED (TCP Flags: SYN)
Nov 20 09:52:11.616: 10.154.0.19:46532 (ID:22760) -> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) to-endpoint FORWARDED (TCP Flags: SYN)
Nov 20 09:52:11.616: 10.154.0.19:46532 (world) <> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) policy-verdict:none INGRESS DENIED (TCP Flags: SYN)
Nov 20 09:52:11.616: 10.154.0.19:46532 (world) <> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) Policy denied DROPPED (TCP Flags: SYN)
Nov 20 09:52:12.630: 10.154.0.19:46532 (world) <> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) policy-verdict:none INGRESS DENIED (TCP Flags: SYN)
Nov 20 09:52:12.630: 10.154.0.19:46532 (world) <> ns-xxxxxxx/nginx-5dd756489d-rwzwp:80 (ID:8683) Policy denied DROPPED (TCP Flags: SYN)
Here, the source IP seems to be SNATed to be a different IP (10.154.0.19). This is the internal IP address of the node in which these pods are running.
Accessing the nginx pod directly via it's pod IP is successful, even from pods in the same node.
Cilium is deployed via helm version 1.17.6.
Removing the cec makes the requests from all the pods successful.
Additionally, we have the following fields set in our helm values:
bpf:
masquerade: true
egressGateway:
enabled: true
If we set both these to false, then the issue is solved, but we can't do it since we need egress gateway enabled.
This is a gke cluster.
The ciliumEnvoyConfig manifest looks like this:
apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
name: nginx-cec
namespace: ns-xxxxxxx
resourceVersion: "1763632222440175012"
uid: c2aa4a58-dd56-41c7-892a-f06d88b2cd3a
spec:
backendServices:
- name: nginx
namespace: ns-xxxxxxx
number:
- "80"
resources:
- '@type': type.googleapis.com/envoy.config.listener.v3.Listener
filter_chains:
- filter_chain_match:
destination_port: 80
filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
'@type': type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
http_filters:
- name: envoy.filters.http.router
typed_config:
'@type': type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
rds:
route_config_name: ns-xxxxxxx-nginx-route-80
stat_prefix: ns-xxxxxxx-nginx-envoy-listener
name: ns-xxxxxxx-nginx-envoy-listener
- '@type': type.googleapis.com/envoy.config.route.v3.RouteConfiguration
name: ns-xxxxxxx-nginx-route-80
virtual_hosts:
- domains:
- '*'
name: ns-xxxxxxx-nginx-route-80
routes:
- match:
prefix: /
route:
cluster: ns-xxxxxxx/nginx:80
- '@type': type.googleapis.com/envoy.config.cluster.v3.Cluster
connect_timeout: 15s
lb_policy: LEAST_REQUEST
name: ns-xxxxxxx/nginx:80
type: EDS
services:
- name: nginx
namespace: ns-xxxxxxx