Skip to content

Longhorn eviction on upgrade #5133

@academiaresf

Description

@academiaresf

Summary

I'm trying to upgrade from 1.32/stable to 1.33/stable.

After reading the upgrade documentation on microk8s, i applied the drain command into out Longhorn node. The Longhorn node is dedicated and centralizes de data volumes kubectl label node longhorn-node node.longhorn.io/create-default-disk=true.

The drain process displays an error when evicting the controller managers of Longhorn:

error when evicting pods/"instance-manager-8763bac9c3643faad834ede4019508bd" -n "longhorn-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

What Should Happen Instead?

Work

Reproduction Steps

# helm values
persistence:
  defaultClassReplicaCount: 1

defaultSettings:
  createDefaultDiskLabeledNodes: true

longhornUI:
  tolerations:
    - key: "node-role.kubernetes.io/longhorn"
      operator: "Exists"
      effect: "NoSchedule"

longhornManager:
  tolerations:
    - key: "node-role.kubernetes.io/longhorn"
      operator: "Exists"
      effect: "NoSchedule"
    - key: "node-role.kubernetes.io/apps"
      operator: "Exists"
      effect: "NoSchedule"

longhornDriver:
  tolerations:
    - key: "node-role.kubernetes.io/longhorn"
      operator: "Exists"
      effect: "NoSchedule"
    - key: "node-role.kubernetes.io/apps"
      operator: "Exists"
      effect: "NoSchedule"
# on microk8s worker node

apt-get update
apt-get install open-iscsi bash curl grep cryptsetup dmsetup -y

curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.8.0/scripts/environment_check.sh | bash

helm repo add longhorn https://charts.longhorn.io
helm repo update

helm upgrade --install longhorn longhorn/longhorn \
--namespace longhorn-system --create-namespace --version 1.8.0 \
--set defaultSettings.defaultDataPath="/longhorn" \
--set csi.kubeletRootDir="/var/snap/microk8s/common/var/lib/kubelet" -f longhorn-values.yaml

Introspection Report

Tarball file has a lot of folders.

Inspecting system
Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-kubelite is running
  Service snap.microk8s.daemon-k8s-dqlite is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy openSSL information to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy asnycio usage and limits to the final report tarball
  Copy inotify max_user_instances and max_user_watches to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster
Inspecting dqlite
  Inspect dqlite
cp: cannot stat '/var/snap/microk8s/7665/var/kubernetes/backend/localnode.yaml': No such file or directory

Building the report tarball
  Report tarball is at /var/snap/microk8s/7665/inspection-report-20250705_231447.tar.gz

Can you suggest a fix?

Some documentation because Longhorn is a wide used technology.

Are you interested in contributing with a fix?

I can do the documentation if a have the solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions