Skip to content

VPA: Aggregates not pruned when container names change, stale aggregates result in erroneously split minimum resources #6744

Open
@jkyros

Description

Which component are you using?:
vertical-pod-autoscaler

What version of the component are you using?:

Component version: 1.0.0

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2

What environment is this in?:
GCP, kind, probably All

What did you expect to happen?:

When changing the name of a container in a deployment managed by a VPA, I expected the VPA recommender to react as though the container had been replaced, rather than a new container having been added, e.g.:

  recommendation:
    containerRecommendations:
    - containerName: sleeper-a
      lowerBound:
        cpu: 25m
        memory: 262144k
      target:
        cpu: 25m
        memory: 262144k
      uncappedTarget:
        cpu: 25m
        memory: 262144k
      upperBound:
        cpu: 25m
        memory: 262144k

and then after the rename (from sleeper-a to sleeper-b) I'd expect something like:

  recommendation:
    containerRecommendations:
    - containerName: sleeper-b
      lowerBound:
        cpu: 25m
        memory: 262144k
      target:
        cpu: 25m
        memory: 262144k
      uncappedTarget:
        cpu: 25m
        memory: 262144k
      upperBound:
        cpu: 25m
        memory: 262144k

What happened instead?:

The VPA reacts as though a container has been added to the deployment, splits the resources accordingly, keeps everything (aggregates, checkpoints, recommendations) for the old container, and never cleans them up:

recommendation:
    containerRecommendations:
    - containerName: sleeper-a
      lowerBound:
        cpu: 12m
        memory: 131072k
      target:
        cpu: 12m
        memory: 131072k
      uncappedTarget:
        cpu: 12m
        memory: 131072k
      upperBound:
        cpu: 4811m
        memory: "5029681818"
    - containerName: sleeper-b
      lowerBound:
        cpu: 12m
        memory: 131072k
      target:
        cpu: 12m
        memory: 131072k
      uncappedTarget:
        cpu: 12m
        memory: 131072k
      upperBound:
        cpu: 17291m
        memory: "18076954545"

This is especially problematic for containers whose resources drop low enough as a result of this split that they start failing health checks.

How to reproduce it (as minimally and precisely as possible):

This is just a deployment that sleeps and does nothing so it gets the default minimum resources.

Deployment + VPA:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sleeper
  namespace: default
spec:
  selector:
    matchLabels:
      app: sleeper
  replicas: 2
  template:
    metadata:
      labels:
        app: sleeper
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534 # nobody
      containers:
        - name: sleeper-a
          image: registry.k8s.io/ubuntu-slim:0.1
          resources:
            requests:
              cpu: 100m
              memory: 50Mi
          command: ["/bin/sh"]
          args:
            - "-c"
            - "sleep infinity"
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  annotations:
  name: vpa
  namespace: default
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sleeper
  updatePolicy:
    updateMode: Recreate

After applying, wait for it to get a recommendation:

[jkyros@jkyros-thinkpadp1gen5 vpa-checkpoints]$ kubectl get vpa -n default vpa 
...
status:
  conditions:
  - lastTransitionTime: "2024-04-18T23:30:50Z"
    status: "True"
    type: RecommendationProvided
  recommendation:
    containerRecommendations:
    - containerName: sleeper-a
      lowerBound:
        cpu: 25m
        memory: 262144k
      target:
        cpu: 25m
        memory: 262144k
      uncappedTarget:
        cpu: 25m
        memory: 262144k
      upperBound:
        cpu: 59411m
        memory: 62111500k

Then change the name of the container (here I changed sleeper-a to sleeper-b):

      containers:
        - name: sleeper-b
          image: registry.k8s.io/ubuntu-slim:0.1
          resources:
            requests:
              cpu: 100m
              memory: 50Mi
          command: ["/bin/sh"]
          args:
            - "-c"
            - "sleep infinity"

Watch it roll out:

NAMESPACE            NAME                                                             READY   STATUS      RESTARTS   AGE
default              sleeper-7b5cb584cb-66z2z                                         1/1     Running       0          15s
default              sleeper-7b5cb584cb-w5b2b                                         1/1     Terminating   0          33s
default              sleeper-7b5cb584cb-xxn8h                                         1/1     Running       0          32s
default              sleeper-84949f4f97-68xx6                                         0/1     Terminating   0          3m15s

It eventually finishes:

NAMESPACE            NAME                                                             READY   STATUS      RESTARTS   AGE
default              sleeper-7b5cb584cb-66z2z                                         1/1     Running     0          53s
default              sleeper-7b5cb584cb-xxn8h                                         1/1     Running     0          70s

The VPA still thinks we have two containers:

  recommendation:
    containerRecommendations:
    - containerName: sleeper-a
      lowerBound:
        cpu: 12m
        memory: 131072k
      target:
        cpu: 12m
        memory: 131072k
      uncappedTarget:
        cpu: 12m
        memory: 131072k
      upperBound:
        cpu: 4811m
        memory: "5029681818"
    - containerName: sleeper-b
      lowerBound:
        cpu: 12m
        memory: 131072k
      target:
        cpu: 12m
        memory: 131072k
      uncappedTarget:
        cpu: 12m
        memory: 131072k
      upperBound:
        cpu: 17291m
        memory: "18076954545"

And the default lowerBound CPU of cpu: 25mhas been spread across the containers, with each of them getting cpu: 12m, even though only one of them (sleeper-b) actually exists anymore.

Anything else we need to know?:

  • If you change the container name again, you get a 3rd one, a 4th one, etc and the resources get smaller.
  • The old container's stuff ( recommendations, aggregates, checkpoints) doesn't seem to ever get cleaned up, and the recommender keeps maintaining the checkpoints (if you delete one, it comes back)
  • I suspect this is due to a confluence of issues around our handling of containers:
    • Aggregates don't get pruned when containers are removed from a VPA's pods if the VPA and targetRef are otherwise intact
    • Len() over the aggregates during a rollout in which a container name has changed or removed will be wrong (there will be legitimately 2 containers, but each pod will only have one, the resources should not be split)
    • Checkpoints don't get pruned when containers are removed from a VPA's pods if the VPA and targetRef are otherwise intact, and they can get loaded back in if the recommender restarts
  • Workaround (to prevent "too small resources after split", at least) could be a minimum resource policy, e.g.:
    apiVersion: "autoscaling.k8s.io/v1"
    kind: VerticalPodAutoscaler
    metadata:
      name: vpa
    spec:
      targetRef:
        apiVersion: "apps/v1"
        kind: Deployment
        name: sleeper
      resourcePolicy:
        containerPolicies:
          - containerName: '*'
            minAllowed:
              cpu: 25
              memory: 50Mi
            controlledResources: ["cpu", "memory"]
    

Metadata

Assignees

Labels

area/vertical-pod-autoscalerkind/bugCategorizes issue or PR as related to a bug.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions