Description
envoy/source/common/upstream/cds_api_impl.cc
Lines 52 to 101 in 2966597
This is a performance issue, not a bug per se.
When doing CDS updates with many clusters, Envoy will often get "stuck" evaluating the CDS update. This manifests as EDS failing, and in more extreme cases, the envoy ceases to receive any XDS updates. When this happens, Envoy needs to be restarted to get it updating again.
In our case we're seeing issues with the current implementation of void CdsApiImpl::onConfigUpdate
with a number of clusters in the 3000-7000 range. If envoy could speedily evaluate a CDS update with 10000 clusters this would represent a HUGE improvement in Envoy's behavior for us. Right now, only around 2500 clusters in a CDS update seems to evaluate in reasonable amount of time.
Because the function void CdsApiImpl::onConfigUpdate
pauses EDS while doing CDS evaluation, envoy's config will drift. With many clusters in CDS, this can mean envoy is hundreds of seconds behind what is current, and results in 503's.
Some context:
- Envoy in my test environment is being run without constraints on 8 core VMs that are running at a max of 30% CPU utilization and max 40% memory (of 32 GB):
resources:
limits:
memory: "32212254720"
requests:
cpu: 100m
memory: 256M
- We're using Project Contour as our Ingress Controller in K8s.
Contour currently doesn't use the incremental APIs of Envoy, so when K8s services change in the cluster it sends ALL of the current config again to Envoy, which means a small change like adding or removing a K8s SVC which maps to an envoy cluster results in Envoy having to re-evaluate ALL clusters.
With enough K8s Services behind an Ingress (7000+) envoy can spontaneously cease to receive any new updates indefinitely, and will fail to do EDS because it gets stuck in the CDS evaluation.
Given a high enough number of clusters this would be a Very Hard Problem, but given that we're in the low thousands, I'm hoping there are some things that could be done to improve performance without resorting to exotic methods.
Please let me know if there's any information I can provide that could help!