Skip to content

Restricting VPA recommender scans to specific namespaces #7697

Closed
@ncmuthu

Description

@ncmuthu

Which component are you using?:
/area vertical-pod-autoscaler

What version of the component are you using?:
1.0.0

Component version:

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
$ kubectl version
Client Version: v1.32.0
Kustomize Version: v5.5.0
Server Version: v1.32.0

What environment is this in?:
AWS EKS and local Kind cluster

What did you expect to happen?:
I am using the flag --vpa-base-namespace=vpa to limit the VPA functionality only to vpa namespace. It is detecting the VPA resources only from the specified vpa namespace, but the vpa recommender scans the verticalpodautoscalercheckpoints of all namespaces every 10minutes instead of scanning only the specified namespace. We have around 3000 namespaces, so scanning all the namespaces every 10minutes adds load to the kube-api server.

What happened instead?:
Vpa recommender scans the verticalpodautoscalercheckpoints of all namespaces every 10minutes instead of scanning only the specified namespace. Would like to avoid scanning all the namespaces

How to reproduce it (as minimally and precisely as possible):

  • Install the VPA with default parameters and add --vpa-base-namespace=vpa
  • Create 3000+ empty namespaces in the cluster
  • After 13 minutes, will be able to see the logs similar to below.

Anything else we need to know?:
Logs:

I0115 14:54:19.448086       1 flags.go:57] FLAG: --add-dir-header="false"
I0115 14:54:19.448217       1 flags.go:57] FLAG: --address=":8942"
I0115 14:54:19.448218       1 flags.go:57] FLAG: --alsologtostderr="false"
I0115 14:54:19.448219       1 flags.go:57] FLAG: --checkpoints-gc-interval="10m0s"
I0115 14:54:19.448220       1 flags.go:57] FLAG: --checkpoints-timeout="1m0s"
I0115 14:54:19.448221       1 flags.go:57] FLAG: --container-name-label="name"
I0115 14:54:19.448222       1 flags.go:57] FLAG: --container-namespace-label="namespace"
I0115 14:54:19.448224       1 flags.go:57] FLAG: --container-pod-name-label="pod_name"
I0115 14:54:19.448225       1 flags.go:57] FLAG: --cpu-histogram-decay-half-life="24h0m0s"
I0115 14:54:19.448226       1 flags.go:57] FLAG: --cpu-integer-post-processor-enabled="false"
I0115 14:54:19.448227       1 flags.go:57] FLAG: --external-metrics-cpu-metric=""
I0115 14:54:19.448228       1 flags.go:57] FLAG: --external-metrics-memory-metric=""
I0115 14:54:19.448229       1 flags.go:57] FLAG: --history-length="8d"
I0115 14:54:19.448230       1 flags.go:57] FLAG: --history-resolution="1h"
I0115 14:54:19.448231       1 flags.go:57] FLAG: --kube-api-burst="20"
I0115 14:54:19.448232       1 flags.go:57] FLAG: --kube-api-qps="5"
I0115 14:54:19.448238       1 flags.go:57] FLAG: --kubeconfig=""
I0115 14:54:19.448240       1 flags.go:57] FLAG: --log-backtrace-at=":0"
I0115 14:54:19.448242       1 flags.go:57] FLAG: --log-dir=""
I0115 14:54:19.448243       1 flags.go:57] FLAG: --log-file=""
I0115 14:54:19.448244       1 flags.go:57] FLAG: --log-file-max-size="1800"
I0115 14:54:19.448246       1 flags.go:57] FLAG: --logtostderr="true"
I0115 14:54:19.448251       1 flags.go:57] FLAG: --memory-aggregation-interval="24h0m0s"
I0115 14:54:19.448252       1 flags.go:57] FLAG: --memory-aggregation-interval-count="8"
I0115 14:54:19.448253       1 flags.go:57] FLAG: --memory-histogram-decay-half-life="24h0m0s"
I0115 14:54:19.448254       1 flags.go:57] FLAG: --memory-saver="false"
I0115 14:54:19.448256       1 flags.go:57] FLAG: --metric-for-pod-labels="up{job=\"kubernetes-pods\"}"
I0115 14:54:19.448257       1 flags.go:57] FLAG: --min-checkpoints="10"
I0115 14:54:19.448258       1 flags.go:57] FLAG: --one-output="false"
I0115 14:54:19.448260       1 flags.go:57] FLAG: --oom-bump-up-ratio="1.2"
I0115 14:54:19.448264       1 flags.go:57] FLAG: --oom-min-bump-up-bytes="1.048576e+08"
I0115 14:54:19.448265       1 flags.go:57] FLAG: --password=""
I0115 14:54:19.448267       1 flags.go:57] FLAG: --pod-label-prefix="pod_label_"
I0115 14:54:19.448275       1 flags.go:57] FLAG: --pod-name-label="kubernetes_pod_name"
I0115 14:54:19.448276       1 flags.go:57] FLAG: --pod-namespace-label="kubernetes_namespace"
I0115 14:54:19.448277       1 flags.go:57] FLAG: --pod-recommendation-min-cpu-millicores="15"
I0115 14:54:19.448279       1 flags.go:57] FLAG: --pod-recommendation-min-memory-mb="100"
I0115 14:54:19.448280       1 flags.go:57] FLAG: --prometheus-address=""
I0115 14:54:19.448281       1 flags.go:57] FLAG: --prometheus-cadvisor-job-name="kubernetes-cadvisor"
I0115 14:54:19.448282       1 flags.go:57] FLAG: --prometheus-query-timeout="5m"
I0115 14:54:19.448285       1 flags.go:57] FLAG: --recommendation-margin-fraction="0.15"
I0115 14:54:19.448292       1 flags.go:57] FLAG: --recommender-interval="1m0s"
I0115 14:54:19.448293       1 flags.go:57] FLAG: --recommender-name="default"
I0115 14:54:19.448294       1 flags.go:57] FLAG: --skip-headers="false"
I0115 14:54:19.448295       1 flags.go:57] FLAG: --skip-log-headers="false"
I0115 14:54:19.448296       1 flags.go:57] FLAG: --stderrthreshold="2"
I0115 14:54:19.448297       1 flags.go:57] FLAG: --storage=""
I0115 14:54:19.448298       1 flags.go:57] FLAG: --target-cpu-percentile="0.9"
I0115 14:54:19.448299       1 flags.go:57] FLAG: --use-external-metrics="false"
I0115 14:54:19.448300       1 flags.go:57] FLAG: --username=""
I0115 14:54:19.448302       1 flags.go:57] FLAG: --v="4"
I0115 14:54:19.448303       1 flags.go:57] FLAG: --vmodule=""
I0115 14:54:19.448304       1 flags.go:57] FLAG: --vpa-object-namespace="vpa"
I0115 14:54:19.448309       1 main.go:110] Vertical Pod Autoscaler 1.0.0 Recommender: default
I0115 14:54:19.448702       1 reflector.go:221] Starting reflector *v1.DaemonSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.448715       1 reflector.go:257] Listing and watching *v1.DaemonSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.549978       1 shared_informer.go:303] caches populated
I0115 14:54:19.550002       1 controller_fetcher.go:141] Initial sync of DaemonSet completed
I0115 14:54:19.550102       1 reflector.go:221] Starting reflector *v1.Deployment (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.550111       1 reflector.go:257] Listing and watching *v1.Deployment from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.651238       1 shared_informer.go:303] caches populated
I0115 14:54:19.651281       1 controller_fetcher.go:141] Initial sync of Deployment completed
I0115 14:54:19.651474       1 reflector.go:221] Starting reflector *v1.ReplicaSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.651489       1 reflector.go:257] Listing and watching *v1.ReplicaSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.753745       1 shared_informer.go:303] caches populated
I0115 14:54:19.753790       1 controller_fetcher.go:141] Initial sync of ReplicaSet completed
I0115 14:54:19.754054       1 reflector.go:221] Starting reflector *v1.StatefulSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.754082       1 reflector.go:257] Listing and watching *v1.StatefulSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.856309       1 shared_informer.go:303] caches populated
I0115 14:54:19.856382       1 controller_fetcher.go:141] Initial sync of StatefulSet completed
I0115 14:54:19.856767       1 reflector.go:221] Starting reflector *v1.ReplicationController (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.856799       1 reflector.go:257] Listing and watching *v1.ReplicationController from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.957446       1 shared_informer.go:303] caches populated
I0115 14:54:19.957476       1 controller_fetcher.go:141] Initial sync of ReplicationController completed
I0115 14:54:19.957626       1 reflector.go:221] Starting reflector *v1.Job (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.957638       1 reflector.go:257] Listing and watching *v1.Job from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:20.059084       1 shared_informer.go:303] caches populated
I0115 14:54:20.059119       1 controller_fetcher.go:141] Initial sync of Job completed
I0115 14:54:20.059381       1 reflector.go:221] Starting reflector *v1.CronJob (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:20.059402       1 reflector.go:257] Listing and watching *v1.CronJob from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:20.160830       1 shared_informer.go:303] caches populated
I0115 14:54:20.160883       1 controller_fetcher.go:141] Initial sync of CronJob completed
I0115 14:54:20.161137       1 main.go:148] Using Metrics Server.
I0115 14:54:20.161276       1 reflector.go:221] Starting reflector *v1.Pod (1h0m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/cluster_feeder.go:171
I0115 14:54:20.161295       1 reflector.go:257] Listing and watching *v1.Pod from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/cluster_feeder.go:171
I0115 14:54:20.161542       1 reflector.go:221] Starting reflector *v1.VerticalPodAutoscaler (1h0m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:88
I0115 14:54:20.161556       1 reflector.go:257] Listing and watching *v1.VerticalPodAutoscaler from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:88
I0115 14:54:20.262025       1 shared_informer.go:303] caches populated
I0115 14:54:20.262112       1 api.go:92] Initial VPA synced successfully
I0115 14:54:20.262410       1 shared_informer.go:303] caches populated
I0115 14:54:20.262456       1 fetcher.go:99] Initial sync of DaemonSet completed
I0115 14:54:20.262494       1 shared_informer.go:303] caches populated
I0115 14:54:20.262501       1 fetcher.go:99] Initial sync of Deployment completed
I0115 14:54:20.262509       1 shared_informer.go:303] caches populated
I0115 14:54:20.262516       1 fetcher.go:99] Initial sync of ReplicaSet completed
I0115 14:54:20.262532       1 shared_informer.go:303] caches populated
I0115 14:54:20.262558       1 fetcher.go:99] Initial sync of StatefulSet completed
I0115 14:54:20.262571       1 shared_informer.go:303] caches populated
I0115 14:54:20.262576       1 fetcher.go:99] Initial sync of ReplicationController completed
I0115 14:54:20.262583       1 shared_informer.go:303] caches populated
I0115 14:54:20.262588       1 fetcher.go:99] Initial sync of Job completed
I0115 14:54:20.262627       1 shared_informer.go:303] caches populated
I0115 14:54:20.262633       1 fetcher.go:99] Initial sync of CronJob completed
W0115 14:54:20.344780       1 shared_informer.go:419] The sharedIndexInformer has started, run more than once is not allowed
W0115 14:54:20.344812       1 shared_informer.go:419] The sharedIndexInformer has started, run more than once is not allowed
W0115 14:54:20.344841       1 shared_informer.go:419] The sharedIndexInformer has started, run more than once is not allowed
W0115 14:54:20.344845       1 shared_informer.go:419] The sharedIndexInformer has started, run more than once is not allowed
W0115 14:54:20.344856       1 shared_informer.go:419] The sharedIndexInformer has started, run more than once is not allowed
W0115 14:54:20.344853       1 shared_informer.go:419] The sharedIndexInformer has started, run more than once is not allowed
W0115 14:54:20.344857       1 shared_informer.go:419] The sharedIndexInformer has started, run more than once is not allowed
I0115 14:54:20.345077       1 recommender.go:210] New Recommender created &{clusterState:0x400001d0e0 clusterStateFeeder:0x4000161a40 checkpointWriter:0x4000310588 checkpointsGCInterval:600000000000 controllerFetcher:0x400068a2d0 lastCheckpointGC:{wall:13968495778911565158 ext:946342960 loc:0x23c8b00} vpaClient:0x4000417490 podResourceRecommender:0x40005121b0 useCheckpoints:true lastAggregateContainerStateGC:{wall:13968495778911564949 ext:946342794 loc:0x23c8b00} recommendationPostProcessor:[0x23fea40]}
I0115 14:54:20.345181       1 cluster_feeder.go:245] Initializing VPA from checkpoints
I0115 14:54:20.345214       1 cluster_feeder.go:317] Start selecting the vpaCRDs.
I0115 14:54:20.345229       1 cluster_feeder.go:352] Fetched 1 VPAs.
I0115 14:54:20.345311       1 cluster_feeder.go:362] Using selector app=nginx for VPA vpa/nginx-vpa
I0115 14:54:20.345362       1 cluster_feeder.go:254] Fetching checkpoints from namespace vpa
I0115 14:54:20.351032       1 cluster_feeder.go:261] Loading VPA vpa/nginx-vpa checkpoint for nginx

I0115 15:04:20.363110       1 recommender.go:155] Recommender Run
I0115 15:04:20.363224       1 cluster_feeder.go:317] Start selecting the vpaCRDs.
I0115 15:04:20.363240       1 cluster_feeder.go:352] Fetched 1 VPAs.
I0115 15:04:20.363347       1 cluster_feeder.go:362] Using selector app=nginx for VPA vpa/nginx-vpa
I0115 15:04:20.375177       1 metrics_client.go:74] 14 podMetrics retrieved for all namespaces
I0115 15:04:20.375355       1 cluster_feeder.go:440] ClusterSpec fed with #36 ContainerUsageSamples for #18 containers. Dropped #0 samples.
I0115 15:04:20.375387       1 recommender.go:165] ClusterState is tracking 14 PodStates and 1 VPAs
I0115 15:04:20.384038       1 checkpoint_writer.go:114] Saved VPA vpa/nginx-vpa checkpoint for nginx
I0115 15:04:20.384091       1 cluster_feeder.go:272] Starting garbage collection of checkpoints
I0115 15:04:20.384110       1 cluster_feeder.go:317] Start selecting the vpaCRDs.
I0115 15:04:20.384116       1 cluster_feeder.go:352] Fetched 1 VPAs.
I0115 15:04:20.384185       1 cluster_feeder.go:362] Using selector app=nginx for VPA vpa/nginx-vpa
I0115 15:04:22.362238       1 request.go:622] Waited for 192.79075ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns101/verticalpodautoscalercheckpoints
I0115 15:04:22.563352       1 request.go:622] Waited for 198.225876ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1010/verticalpodautoscalercheckpoints
I0115 15:04:22.761011       1 request.go:622] Waited for 192.809625ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1011/verticalpodautoscalercheckpoints
I0115 15:04:22.962065       1 request.go:622] Waited for 196.395709ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1012/verticalpodautoscalercheckpoints
I0115 15:04:23.161045       1 request.go:622] Waited for 193.92025ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1013/verticalpodautoscalercheckpoints
I0115 15:04:23.364266       1 request.go:622] Waited for 199.671167ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1014/verticalpodautoscalercheckpoints
I0115 15:04:23.563172       1 request.go:622] Waited for 193.308292ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1015/verticalpodautoscalercheckpoints
I0115 15:04:23.762318       1 request.go:622] Waited for 194.669916ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1016/verticalpodautoscalercheckpoints
I0115 15:04:23.961083       1 request.go:622] Waited for 192.070292ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1017/verticalpodautoscalercheckpoints
I0115 15:04:24.161377       1 request.go:622] Waited for 195.512209ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1018/verticalpodautoscalercheckpoints
I0115 15:04:24.360894       1 request.go:622] Waited for 195.286375ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1019/verticalpodautoscalercheckpoints
I0115 15:04:24.562020       1 request.go:622] Waited for 195.598125ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns102/verticalpodautoscalercheckpoints
I0115 15:04:24.763033       1 request.go:622] Waited for 195.887625ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1020/verticalpodautoscalercheckpoints


We can avoid the client side throttling by increasing the kube-api-qps, but would like to avoid scanning all namespaces where we are not going to create VPA resources.

Metadata

Metadata

Assignees

Labels

area/vertical-pod-autoscalerkind/bugCategorizes issue or PR as related to a bug.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions