-
Notifications
You must be signed in to change notification settings - Fork 716
Bug: SDS never delivers client certificate secret when it is exclusively referenced by a SecurityPolicy extAuth Backend #8616
Description
Description
When an EG Backend resource's spec.tls.clientCertificateRef secret is only referenced from a SecurityPolicy extAuth (or gRPC auth) backendRef, Envoy subscribes to the secret via SDS but the EG controller never pushes the secret payload. The secret is permanently stuck in dynamic_warming_secrets with version_info: "uninitialized", causing every outbound TLS handshake to the auth backend to fail with cx_connect_fail.
Expected: The client certificate secret is included in the xDS snapshot and Envoy can complete the mTLS handshake to the ext auth backend.
Actual: sds.<namespace>/<secret-name>.init_fetch_timeout: 1, update_success: 0 — the secret never leaves WARMING state and all connections to the ext auth backend fail.
The bug is not a race condition. The WARMING state persists indefinitely (verified over 80+ seconds).
Root cause (code-level): The xDS translator has two separate code paths for adding client-cert secrets to the snapshot:
-
processHTTPListenerXdsTranslation(translator.go) — walksroute.Destination.Settings[*].TLS.ClientCertificatesand callsbuildXdsTLSCertSecret+tCtx.AddXdsResourcefor every route backend. ✅ Works. -
(*extAuth).patchResources(extauth.go) — callscreateExtServiceXDSClusterwhich internally callsaddXdsCluster.addXdsClusterpushes the CA cert (buildXdsUpstreamTLSCASecret) but never callsbuildXdsTLSCertSecretfor the client cert. ❌ Missing.
Envoy receives an SDS subscription config referencing the secret name (so the secret appears in dynamic_warming_secrets) but the controller never sends the actual key/cert payload.
Repro Steps
Prerequisites
- A Kubernetes cluster with Envoy Gateway v1.7.1 installed
- Two backend services (
myappandauthz) that require mutual TLS (i.e., they verify the client certificate presented by the gateway) - Three secrets in the same namespace:
myapp-tls— leaf TLS cert/key for the gateway to present tomyappauthz-tls— leaf TLS cert/key for the gateway to present toauthz; this is the secret that will get stuckca-bundle— CA certificate used to verify both backend server certs
Minimal Reproducing Config
---
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: eg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: eg
namespace: default
spec:
gatewayClassName: eg
listeners:
- name: http
port: 80
protocol: HTTP
---
# Backend for the main application — its clientCertificateRef (myapp-tls) will
# be referenced by an HTTPRoute, so EG pushes it correctly. Acts as a control.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
name: myapp-backend
namespace: default
spec:
endpoints:
- fqdn:
hostname: myapp.default.svc.cluster.local
port: 8443
tls:
caCertificateRefs:
- name: ca-bundle
group: ''
kind: Secret
clientCertificateRef:
name: myapp-tls # ← this secret IS pushed to SDS (routed via HTTPRoute)
group: ''
kind: Secret
sni: myapp.default.svc.cluster.local
---
# Backend for the ext auth service — its clientCertificateRef (authz-tls) is
# ONLY referenced from a SecurityPolicy extAuth backendRef. This secret will
# be STUCK in SDS WARMING state.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
name: authz-backend
namespace: default
spec:
endpoints:
- fqdn:
hostname: authz.default.svc.cluster.local
port: 8443
tls:
caCertificateRefs:
- name: ca-bundle
group: ''
kind: Secret
clientCertificateRef:
name: authz-tls # ← this secret is NEVER pushed to SDS (only extAuth ref)
group: ''
kind: Secret
sni: authz.default.svc.cluster.local
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: myapp
namespace: default
spec:
parentRefs:
- name: eg
rules:
- matches:
- path:
type: PathPrefix
value: "/"
backendRefs:
- name: myapp-backend
group: gateway.envoyproxy.io
kind: Backend
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: SecurityPolicy
metadata:
name: myapp-auth
namespace: default
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: myapp
extAuth:
http:
backendRefs:
- name: authz-backend
namespace: default
group: gateway.envoyproxy.io
kind: BackendVerification commands
# 1. Apply the config
kubectl apply -f repro.yaml
# 2. Wait for the Gateway to become Ready, then get the Envoy proxy pod name
ENVOY_POD=$(kubectl get pods -n envoy-gateway-system \
-l 'app.kubernetes.io/component=proxy' \
-o jsonpath='{.items[0].metadata.name}')
# 3. Port-forward the Envoy admin interface
kubectl port-forward -n envoy-gateway-system pod/$ENVOY_POD 19000:19000 &
# 4. Check SDS stats — authz-tls will show:
# init_fetch_timeout: 1
# update_success: 0
# while myapp-tls will show update_success: 1
curl -s http://localhost:19000/stats | grep "^sds\."
# 5. Check dynamic_warming_secrets in the config dump
curl -s http://localhost:19000/config_dump | \
python3 -c "
import sys, json
d = json.load(sys.stdin)
for section in d.get('configs', []):
if section.get('@type','').endswith('SecretsConfigDump'):
warming = [(s['name'], s.get('version_info','')) for s in section.get('dynamic_warming_secrets',[])]
print('WARMING:', warming)
active = [s['name'] for s in section.get('dynamic_active_secrets',[])]
print('ACTIVE:', active)
"
# Expected output:
# WARMING: [('default/authz-tls', 'uninitialized')]
# ACTIVE: ['default/myapp-tls', ...]
# 6. To confirm it is not a race condition, wait 60 seconds and re-run step 4.
# update_success for authz-tls will remain 0.Confirming the broken connection
# The ext_authz cluster will show zero successful handshakes and many connect failures
curl -s http://localhost:19000/stats | grep -E "extauth.*ssl\.handshake|extauth.*cx_connect_fail"
# Expected:
# cluster.securitypolicy/default/myapp-auth/extauth/0.ssl.handshake: 0
# cluster.securitypolicy/default/myapp-auth/extauth/0::cx_connect_fail: <N>Workaround
Point authz-backend.spec.tls.clientCertificateRef to any secret that is also used as a clientCertificateRef on a Backend referenced by an HTTPRoute. This causes the HTTPRoute code path to push the secret into the snapshot, and the ext auth cluster shares it via the SDS name lookup.
# Workaround: reuse myapp-tls (already pushed by the HTTPRoute path)
clientCertificateRef:
name: myapp-tls # works even though it's a different identity cert
group: ''
kind: SecretEnvironment
| Envoy Gateway version | v1.7.1 (docker.io/envoyproxy/gateway:v1.7.1) |
| Kubernetes version | v1.31 (LKE) |
| GatewayClass controller | gateway.envoyproxy.io/gatewayclass-controller |
| Affected resources | Backend + SecurityPolicy (extAuth HTTP and gRPC) |
| First affected version | Unknown — reproduced on v1.7.1; not tested on earlier versions |
Logs
EG controller — the secret is reconciled without error but the snapshot push never includes the client cert:
{"level":"info","msg":"processing Secret authz-tls","namespace":"default","name":"authz-tls"}
{"level":"info","msg":"processing Secret authz-tls","namespace":"default","name":"authz-tls"}
# (repeats on every reconcile loop — no errors, but no snapshot push)
Envoy stats (after 60+ seconds of uptime):
sds.default/authz-tls.init_fetch_timeout: 1
sds.default/authz-tls.update_attempt: 1
sds.default/authz-tls.update_success: 0 ← permanently stuck
sds.default/myapp-tls.init_fetch_timeout: 0
sds.default/myapp-tls.update_success: 1 ← correctly delivered
Envoy config dump:
WARMING: [('default/authz-tls', 'uninitialized')]
ACTIVE: ['default/myapp-tls', 'default/ca-bundle', ...]
Code Pointer
The fix should add client-cert snapshot injection in the extAuth cluster creation path, mirroring what processHTTPListenerXdsTranslation already does for route backends:
internal/xds/translator/translator.go — the working pattern (already there for HTTPRoute):
// add http route client certs
for _, route := range httpListener.Routes {
if route.Destination != nil {
for _, st := range route.Destination.Settings {
if st.TLS != nil {
for _, cert := range st.TLS.ClientCertificates {
secret := buildXdsTLSCertSecret(&cert)
tCtx.AddXdsResource(resourcev3.SecretType, secret)
}
}
}
}
}internal/xds/translator/extauth.go — patchResources or internal/xds/translator/utils.go — addXdsCluster needs the equivalent loop for TLS.ClientCertificates on each DestinationSetting.