VPA: Aggregates not pruned when container names change, stale aggregates result in erroneously split minimum resources #6744
Description
Which component are you using?:
vertical-pod-autoscaler
What version of the component are you using?:
Component version: 1.0.0
What k8s version are you using (kubectl version
)?:
kubectl version
Output
$ kubectl version Client Version: v1.30.0 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.29.2
What environment is this in?:
GCP, kind, probably All
What did you expect to happen?:
When changing the name of a container in a deployment managed by a VPA, I expected the VPA recommender to react as though the container had been replaced, rather than a new container having been added, e.g.:
recommendation:
containerRecommendations:
- containerName: sleeper-a
lowerBound:
cpu: 25m
memory: 262144k
target:
cpu: 25m
memory: 262144k
uncappedTarget:
cpu: 25m
memory: 262144k
upperBound:
cpu: 25m
memory: 262144k
and then after the rename (from sleeper-a
to sleeper-b
) I'd expect something like:
recommendation:
containerRecommendations:
- containerName: sleeper-b
lowerBound:
cpu: 25m
memory: 262144k
target:
cpu: 25m
memory: 262144k
uncappedTarget:
cpu: 25m
memory: 262144k
upperBound:
cpu: 25m
memory: 262144k
What happened instead?:
The VPA reacts as though a container has been added to the deployment, splits the resources accordingly, keeps everything (aggregates, checkpoints, recommendations) for the old container, and never cleans them up:
recommendation:
containerRecommendations:
- containerName: sleeper-a
lowerBound:
cpu: 12m
memory: 131072k
target:
cpu: 12m
memory: 131072k
uncappedTarget:
cpu: 12m
memory: 131072k
upperBound:
cpu: 4811m
memory: "5029681818"
- containerName: sleeper-b
lowerBound:
cpu: 12m
memory: 131072k
target:
cpu: 12m
memory: 131072k
uncappedTarget:
cpu: 12m
memory: 131072k
upperBound:
cpu: 17291m
memory: "18076954545"
This is especially problematic for containers whose resources drop low enough as a result of this split that they start failing health checks.
How to reproduce it (as minimally and precisely as possible):
This is just a deployment that sleeps and does nothing so it gets the default minimum resources.
Deployment + VPA:
apiVersion: apps/v1
kind: Deployment
metadata:
name: sleeper
namespace: default
spec:
selector:
matchLabels:
app: sleeper
replicas: 2
template:
metadata:
labels:
app: sleeper
spec:
securityContext:
runAsNonRoot: true
runAsUser: 65534 # nobody
containers:
- name: sleeper-a
image: registry.k8s.io/ubuntu-slim:0.1
resources:
requests:
cpu: 100m
memory: 50Mi
command: ["/bin/sh"]
args:
- "-c"
- "sleep infinity"
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
annotations:
name: vpa
namespace: default
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: sleeper
updatePolicy:
updateMode: Recreate
After applying, wait for it to get a recommendation:
[jkyros@jkyros-thinkpadp1gen5 vpa-checkpoints]$ kubectl get vpa -n default vpa
...
status:
conditions:
- lastTransitionTime: "2024-04-18T23:30:50Z"
status: "True"
type: RecommendationProvided
recommendation:
containerRecommendations:
- containerName: sleeper-a
lowerBound:
cpu: 25m
memory: 262144k
target:
cpu: 25m
memory: 262144k
uncappedTarget:
cpu: 25m
memory: 262144k
upperBound:
cpu: 59411m
memory: 62111500k
Then change the name of the container (here I changed sleeper-a
to sleeper-b
):
containers:
- name: sleeper-b
image: registry.k8s.io/ubuntu-slim:0.1
resources:
requests:
cpu: 100m
memory: 50Mi
command: ["/bin/sh"]
args:
- "-c"
- "sleep infinity"
Watch it roll out:
NAMESPACE NAME READY STATUS RESTARTS AGE
default sleeper-7b5cb584cb-66z2z 1/1 Running 0 15s
default sleeper-7b5cb584cb-w5b2b 1/1 Terminating 0 33s
default sleeper-7b5cb584cb-xxn8h 1/1 Running 0 32s
default sleeper-84949f4f97-68xx6 0/1 Terminating 0 3m15s
It eventually finishes:
NAMESPACE NAME READY STATUS RESTARTS AGE
default sleeper-7b5cb584cb-66z2z 1/1 Running 0 53s
default sleeper-7b5cb584cb-xxn8h 1/1 Running 0 70s
The VPA still thinks we have two containers:
recommendation:
containerRecommendations:
- containerName: sleeper-a
lowerBound:
cpu: 12m
memory: 131072k
target:
cpu: 12m
memory: 131072k
uncappedTarget:
cpu: 12m
memory: 131072k
upperBound:
cpu: 4811m
memory: "5029681818"
- containerName: sleeper-b
lowerBound:
cpu: 12m
memory: 131072k
target:
cpu: 12m
memory: 131072k
uncappedTarget:
cpu: 12m
memory: 131072k
upperBound:
cpu: 17291m
memory: "18076954545"
And the default lowerBound
CPU of cpu: 25m
has been spread across the containers, with each of them getting cpu: 12m
, even though only one of them (sleeper-b
) actually exists anymore.
Anything else we need to know?:
- If you change the container name again, you get a 3rd one, a 4th one, etc and the resources get smaller.
- The old container's stuff ( recommendations, aggregates, checkpoints) doesn't seem to ever get cleaned up, and the recommender keeps maintaining the checkpoints (if you delete one, it comes back)
- I suspect this is due to a confluence of issues around our handling of containers:
- Aggregates don't get pruned when containers are removed from a VPA's pods if the VPA and
targetRef
are otherwise intact Len()
over the aggregates during a rollout in which a container name has changed or removed will be wrong (there will be legitimately 2 containers, but each pod will only have one, the resources should not be split)- Checkpoints don't get pruned when containers are removed from a VPA's pods if the VPA and
targetRef
are otherwise intact, and they can get loaded back in if the recommender restarts
- Aggregates don't get pruned when containers are removed from a VPA's pods if the VPA and
- Workaround (to prevent "too small resources after split", at least) could be a minimum resource policy, e.g.:
apiVersion: "autoscaling.k8s.io/v1" kind: VerticalPodAutoscaler metadata: name: vpa spec: targetRef: apiVersion: "apps/v1" kind: Deployment name: sleeper resourcePolicy: containerPolicies: - containerName: '*' minAllowed: cpu: 25 memory: 50Mi controlledResources: ["cpu", "memory"]