Skip to content

feat(vpa): Improve handling of in-place resource updates against incompatible clusters #8288

@vitanovs

Description

@vitanovs

Which component are you using?:

/area vertical-pod-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

Running VPA 1.4.0+ with in-place resource updates enabled against a Kubernetes cluster with InPlacePodVerticalScaling disabled ( see Kubernetes feature gates), does not indicate that the resource update failure is related to incompatible cluster/configuration.

The following snippet shows logs of vpa-updater attempting to in-place update a resource-consumer instance, but due to the missing resize sub-resource

➜ vertical-pod-autoscaler git:(vertical-pod-autoscaler/v1.4.1) ✗ kubectl get --raw='/api/v1' | jq -Mr '.resources.[].name' | grep -i 'pod'
pods
pods/attach
pods/binding
pods/ephemeralcontainers
pods/eviction
pods/exec
pods/log
pods/portforward
pods/proxy
pods/status
podtemplates

the resource update fails:

I0702 07:33:47.945742       1 update_priority_calculator.go:145] "Pod accepted for update" pod="kube-system/resource-consumer-7fd9594844-dfzft" updatePriority=3.166666666666667 processedRecommendations="resource-consumer: target: 83887k 25m; uncappedTarget: 83887k 25m;"
I0702 07:33:47.946005       1 recommendation_provider.go:121] "Updating requirements for pod" pod="resource-consumer-7fd9594844-dfzft"
I0702 07:33:47.946099       1 pods_inplace_restriction.go:128] "Calculated patches for pod" pod="kube-system/resource-consumer-7fd9594844-dfzft" patches=[{"op":"add","path":"/spec/containers/0/resources/requests/cpu","value":"25m"},{"op":"add","path":"/spec/containers/0/resources/requests/memory","value":"80Mi"}]
I0702 07:33:47.946115       1 pods_inplace_restriction.go:128] "Calculated patches for pod" pod="kube-system/resource-consumer-7fd9594844-dfzft" patches=[{"op":"add","path":"/metadata/annotations/vpaInPlaceUpdated","value":"true"}]
I0702 07:33:47.947660       1 updater.go:286] "In-place update failed" error="the server could not find the requested resource" pod="kube-system/resource-consumer-7fd9594844-dfzft"

Leaving a the server could not find the requested resource message that can be misleading to users that are not familiar with the required configuration ( i.e. having the InPlacePodVerticalScaling gate enabled for the apiserver & kubelet as an example ).

The bellow snippets can be used to reproduce the setup used for testing:

  • Kind cluster running Kubernetes 1.32:
    • use kind cluster create --config=(/path/to/snippet.yaml)
    kind: Cluster
    apiVersion: kind.x-k8s.io/v1alpha4
    name: k8s-132-in-place-disabled
    featureGates:
      "InPlacePodVerticalScaling": false
    nodes:
    - role: control-plane
      image: kindest/node:v1.32.0
    - role: worker
      image: kindest/node:v1.32.0
  • Target the newly created cluster
    • Get config: kind get kubeconfig --name=k8s-132-in-place-disabled > ./kubeconfig-in-place-disabled.yaml
    • Export config: export KUBECONFIG=./kubeconfig-in-place-disabled.yaml
  • Verify that the pod/resize sub-resource is not present
    • Use kubectl get --raw='/api/v1' | jq -Mr '.resources.[].name' | grep -i 'pod' to list pod* (sub)resources.
  • Instrument InPlaceOrRecreate feature gate for:
    • vpa-admission-controller ( vertical-pod-autoscaler/deploy/admission-controller-deployment.yaml ):
      • Add "--feature-gates=InPlaceOrRecreate=true" to the container args
    • vpa-updater ( vertical-pod-autoscaler/deploy/updater-deployment.yaml ):
      • Add - --feature-gates=InPlaceOrRecreate=true to the container args
  • Deploy vpa with ./vertical-pod-autoscaler/hack/vpa-up.sh
  • Use the bellow snippet to create resource-consumer deployment and corresponding vpa resource
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: resource-consumer
  labels:
    app: resource-consumer
spec:
  replicas: 3
  selector:
    matchLabels:
      app: resource-consumer
  template:
    metadata:
      labels:
        app: resource-consumer
    spec:
      containers:
      - name: resource-consumer
        image: gcr.io/k8s-staging-e2e-test-images/resource-consumer:1.9
        resources:
          requests:
            cpu: 10m
            memory: 30Mi
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: resource-consumer
  labels:
    app: resource-consumer
spec:
  selector:
    app: resource-consumer
  ports:
  - protocol: TCP
    port: 8080
    targetPort: 8080
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: resource-consumer
  labels:
    app: resource-consumer
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: resource-consumer
  updatePolicy:
    updateMode: "InPlaceOrRecreate"
  resourcePolicy:
    containerPolicies:
      - containerName: resource-consumer
        minAllowed:
          cpu: 10m
          memory: 25Mi
        maxAllowed:
          cpu: 300m
          memory: 300Mi
  • Update the vpa resource ( named resource-consumer ) maxAllowed/minAllowed to trigger a resource update.
  • Monitor the vpa-updater logs

Describe the solution you'd like.:

Performing a patch request targeting the resize sub-resource:

res, err := ip.client.CoreV1().Pods(podToUpdate.Namespace).Patch(context.TODO(), podToUpdate.Name, k8stypes.JSONPatchType, patch, metav1.PatchOptions{}, "resize")
if err != nil {
return err
}

returns an error that does not provide much context. Within the vpa-updater, the function caller does indicate that the error is in-place update related, but it's of little help when debugging:

err := inPlaceLimiter.InPlaceUpdate(pod, vpa, u.eventRecorder)
if err != nil {
klog.V(0).InfoS("In-place update failed", "error", err, "pod", klog.KObj(pod))
metrics_updater.RecordFailedInPlaceUpdate(vpaSize, "InPlaceUpdateError")
continue
}

We need a mechanism for validating if the Kubernetes cluster ( i.e. kube-apiserver & kubelet ) is compatible with in-place updates before performing the patch request and improve the log message to indicate such cases. One possibility is to query the apiserver for such metadata and perform the validation.

Additional context.:

An additional details about testing the behaviour is that running 1.32 cluster with in-place updates feature gate enabled

featureGates:
  "InPlacePodVerticalScaling": true

allows the usage of VPA 1.4.0+ with InPlaceOrRecreate updateMode.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/vertical-pod-autoscalerkind/bugCategorizes issue or PR as related to a bug.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions