Skip to content

Ingressgateway cache pod ip although pod is deleted, access with this IP and bring 503 for several hours. #39069

Open
@johnzheng1975

Description

@johnzheng1975

Title: One line description
Ingressgateway cache pod ip although pod is deleted, access with this IP and bring 503 for several hours.
After delete, the IP is removed. (from cilium log, from istiod log)
The IP is not in Istio EDS. The IP is in http://localhost:15000/clusters

Description:

What issue is being seen? Describe what should be happening instead of
the bug, for example: Envoy should not crash, the expected value isn't
returned, etc.
old IP still exits (10.203.103.174 10.203.128.80 are the old IP belong to terminated pods)

 2002  k exec -ti -n is istio-ingressgateway-apigee-7d77dd685-dzjw4  --  curl -X POST http://localhost:15000/clusters | grep roles > clusters_istio-ingressgateway-apigee-7d77dd685-dzjw4.txt
 
 2004  k exec -ti -n is  istio-ingressgateway-apigee-7d77dd685-mzfn6  --  curl -X POST http://localhost:15000/clusters | grep roles > clusters_istio-ingressgateway-apigee-7d77dd685-mzfn6.txt

 2005  k exec -ti -n is  istio-ingressgateway-apigee-7d77dd685-plp59  --  curl -X POST http://localhost:15000/clusters | grep roles > clusters_istio-ingressgateway-apigee-7d77dd685-plp59.txt


outbound|80||roles.relationship.svc.cluster.local::observability_name::outbound|80||roles.relationship.svc.cluster.local;
outbound|80||roles.relationship.svc.cluster.local::default_priority::max_connections::4294967295
outbound|80||roles.relationship.svc.cluster.local::default_priority::max_pending_requests::4294967295
outbound|80||roles.relationship.svc.cluster.local::default_priority::max_requests::4294967295
outbound|80||roles.relationship.svc.cluster.local::default_priority::max_retries::4294967295
outbound|80||roles.relationship.svc.cluster.local::high_priority::max_connections::1024
outbound|80||roles.relationship.svc.cluster.local::high_priority::max_pending_requests::1024
outbound|80||roles.relationship.svc.cluster.local::high_priority::max_requests::1024
outbound|80||roles.relationship.svc.cluster.local::high_priority::max_retries::3
outbound|80||roles.relationship.svc.cluster.local::added_via_api::true
outbound|80||roles.relationship.svc.cluster.local::eds_service_name::outbound|80||roles.relationship.svc.cluster.local
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::cx_active::2
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::cx_connect_fail::66
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::cx_total::68
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::rq_active::0
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::rq_error::92
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::rq_success::0
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::rq_timeout::0
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::rq_total::0
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::hostname::
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::health_flags::healthy
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::weight::1
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::region::us-west-2
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::zone::us-west-2a
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::sub_zone::
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::canary::false
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::priority::0
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::success_rate::-1
outbound|80||roles.relationship.svc.cluster.local::10.203.103.174:8080::local_origin_success_rate::-1
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::cx_active::0
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::cx_connect_fail::64
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::cx_total::64
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::rq_active::0
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::rq_error::87
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::rq_success::0
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::rq_timeout::0
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::rq_total::0
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::hostname::
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::health_flags::healthy
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::weight::1
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::region::us-west-2
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::zone::us-west-2b
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::sub_zone::
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::canary::false
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::priority::0
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::success_rate::-1
outbound|80||roles.relationship.svc.cluster.local::10.203.128.80:8080::local_origin_success_rate::-1

Repro steps:

Include sample requests, environment, etc. All data and inputs
required to reproduce the bug.
When hundreds pods deleted/ created on the same time, this will happen easily.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/edsquestionQuestions that are neither investigations, bugs, nor enhancementsstalestalebot believes this issue/PR has not been touched recently

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions