fix: add note for drain stuck when upgrade from v1.4.0 to v1.4.1 (#707)

WebberHuang1118 · jillian-maroket · web-flow · commit db9a109e0ef3 · 2025-01-24T16:04:27.000+11:00
Co-authored-by: Jillian &lt;67180770+jillian-maroket@users.noreply.github.com&gt;
diff --git a/docs/upgrade/v1-1-2-to-v1-2-0.md b/docs/upgrade/v1-1-2-to-v1-2-0.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 7
+sidebar_position: 8
 sidebar_label: Upgrade from v1.1.2 to v1.2.0 (not recommended)
 title: "Upgrade from v1.1.2 to v1.2.0 (not recommended)"
 ---
diff --git a/docs/upgrade/v1-2-0-to-v1-2-1.md b/docs/upgrade/v1-2-0-to-v1-2-1.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 6
+sidebar_position: 7
 sidebar_label: Upgrade from v1.1.2/v1.1.3/v1.2.0 to v1.2.1
 title: "Upgrade from v1.1.2/v1.1.3/v1.2.0 to v1.2.1"
 ---
diff --git a/docs/upgrade/v1-2-1-to-v1-2-2.md b/docs/upgrade/v1-2-1-to-v1-2-2.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 5
+sidebar_position: 6
 sidebar_label: Upgrade from v1.2.1 to v1.2.2
 title: "Upgrade from v1.2.1 to v1.2.2"
 ---
diff --git a/docs/upgrade/v1-2-2-to-v1-3-1.md b/docs/upgrade/v1-2-2-to-v1-3-1.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 4
+sidebar_position: 5
 sidebar_label: Upgrade from v1.2.2/v1.3.0 to v1.3.1
 title: "Upgrade from v1.2.2/v1.3.0 to v1.3.1"
 ---
diff --git a/docs/upgrade/v1-3-1-to-v1-3-2.md b/docs/upgrade/v1-3-1-to-v1-3-2.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 3
+sidebar_position: 4
 sidebar_label: Upgrade from v1.3.1 to v1.3.2
 title: "Upgrade from v1.3.1 to v1.3.2"
 ---
diff --git a/docs/upgrade/v1-3-2-to-v1-4-0.md b/docs/upgrade/v1-3-2-to-v1-4-0.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 2
+sidebar_position: 3
 sidebar_label: Upgrade from v1.3.2 to v1.4.0
 title: "Upgrade from v1.3.2 to v1.4.0"
 ---
diff --git a/docs/upgrade/v1-4-0-to-v1-4-1.md b/docs/upgrade/v1-4-0-to-v1-4-1.md
@@ -0,0 +1,83 @@
+---
+sidebar_position: 2
+sidebar_label: Upgrade from v1.4.0 to v1.4.1
+title: "Upgrade from v1.4.0 to v1.4.1"
+---
+
+<head>
+  <link rel="canonical" href="https://docs.harvesterhci.io/v1.4/upgrade/v1-4-0-to-v1-4-1"/>
+</head>
+
+## General information
+
+An **Upgrade** button appears on the **Dashboard** screen whenever a new Harvester version that you can upgrade to becomes available. For more information, see [Start an upgrade](./automatic.md#start-an-upgrade).
+
+For air-gapped environments, see [Prepare an air-gapped upgrade](./automatic.md#prepare-an-air-gapped-upgrade).
+
+
+## Known issues
+
+---
+
+### 1. Upgrade is stuck in the "Pre-drained" state
+
+The upgrade process may become stuck in the "Pre-drained" state. Kubernetes is supposed to drain the workload on the node, but some factors may cause the process to stall.
+
+![](/img/v1.2/upgrade/known_issues/3730-stuck.png)
+
+A possible cause is processes related to orphan engines of the Longhorn Instance Manager. To determine if this applies to your situation, perform the following steps:
+
+1. Check the name of the `instance-manager` pod on the stuck node.
+
+    Example:
+
+    The stuck node is `harvester-node-1`, and the name of the Instance Manager pod is `instance-manager-d80e13f520e7b952f4b7593fc1883e2a`.
+
+    ```
+    $ kubectl get pods -n longhorn-system --field-selector spec.nodeName=harvester-node-1 | grep instance-manager
+    instance-manager-d80e13f520e7b952f4b7593fc1883e2a          1/1     Running   0              3d8h
+    ```
+
+1. Check the Longhorn Manager logs for informational messages.
+
+    Example:
+
+    ```
+    $ kubectl -n longhorn-system logs daemonsets/longhorn-manager
+    ...
+    time="2025-01-14T00:00:01Z" level=info msg="Node instance-manager-d80e13f520e7b952f4b7593fc1883e2a is marked unschedulable but removing harvester-node-1 PDB is blocked: some volumes are still attached InstanceEngines count 1 pvc-9ae0e9a5-a630-4f0c-98cc-b14893c74f9e-e-0" func="controller.(*InstanceManagerController).syncInstanceManagerPDB" file="instance_manager_controller.go:823" controller=longhorn-instance-manager node=harvester-node-1
+    ```
+
+    The `instance-manager` pod cannot be drained because of the engine `pvc-9ae0e9a5-a630-4f0c-98cc-b14893c74f9e-e-0`.
+
+1. Check if the engine is still running on the stuck node.
+    
+    Example:
+
+    ```
+    $ kubectl -n longhorn-system get engines.longhorn.io pvc-9ae0e9a5-a630-4f0c-98cc-b14893c74f9e-e-0 -o jsonpath='{"Current state: "}{.status.currentState}{"\nNode ID: "}{.spec.nodeID}{"\n"}'
+    Current state: stopped
+    Node ID:
+    ```
+
+    The issue likely exists if the output shows that the engine is not running or even the engine is not found.
+
+1. Check if all volumes are healthy.
+
+    ```
+    kubectl get volumes -n longhorn-system -o yaml | yq '.items[] | select(.status.state == "attached")| .status.robustness'
+    ```
+
+    All volumes must be marked `healthy`. If this is not the case, please help to report the issue.
+
+1. Remove the `instance-manager` pod's PodDisruptionBudget (PDB) .
+
+    Example:
+
+    ```
+    kubectl delete pdb instance-manager-d80e13f520e7b952f4b7593fc1883e2a -n longhorn-system
+    ```
+
+Related issues:
+  - [[BUG] v1.4.0 -> v1.4.1-rc1 upgrade stuck in Pre-drained and the node stay in Cordoned](https://github.com/harvester/harvester/issues/7366)
+  - [[IMPROVEMENT] Cleanup orphaned volume runtime resources if the resources already deleted](https://github.com/longhorn/longhorn/issues/6764)
diff --git a/versioned_docs/version-v1.4/upgrade/v1-1-2-to-v1-2-0.md b/versioned_docs/version-v1.4/upgrade/v1-1-2-to-v1-2-0.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 7
+sidebar_position: 8
 sidebar_label: Upgrade from v1.1.2 to v1.2.0 (not recommended)
 title: "Upgrade from v1.1.2 to v1.2.0 (not recommended)"
 ---
diff --git a/versioned_docs/version-v1.4/upgrade/v1-2-0-to-v1-2-1.md b/versioned_docs/version-v1.4/upgrade/v1-2-0-to-v1-2-1.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 6
+sidebar_position: 7
 sidebar_label: Upgrade from v1.1.2/v1.1.3/v1.2.0 to v1.2.1
 title: "Upgrade from v1.1.2/v1.1.3/v1.2.0 to v1.2.1"
 ---
diff --git a/versioned_docs/version-v1.4/upgrade/v1-2-1-to-v1-2-2.md b/versioned_docs/version-v1.4/upgrade/v1-2-1-to-v1-2-2.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 5
+sidebar_position: 6
 sidebar_label: Upgrade from v1.2.1 to v1.2.2
 title: "Upgrade from v1.2.1 to v1.2.2"
 ---
diff --git a/versioned_docs/version-v1.4/upgrade/v1-2-2-to-v1-3-1.md b/versioned_docs/version-v1.4/upgrade/v1-2-2-to-v1-3-1.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 4
+sidebar_position: 5
 sidebar_label: Upgrade from v1.2.2/v1.3.0 to v1.3.1
 title: "Upgrade from v1.2.2/v1.3.0 to v1.3.1"
 ---
diff --git a/versioned_docs/version-v1.4/upgrade/v1-3-1-to-v1-3-2.md b/versioned_docs/version-v1.4/upgrade/v1-3-1-to-v1-3-2.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 3
+sidebar_position: 4
 sidebar_label: Upgrade from v1.3.1 to v1.3.2
 title: "Upgrade from v1.3.1 to v1.3.2"
 ---
diff --git a/versioned_docs/version-v1.4/upgrade/v1-3-2-to-v1-4-0.md b/versioned_docs/version-v1.4/upgrade/v1-3-2-to-v1-4-0.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 2
+sidebar_position: 3
 sidebar_label: Upgrade from v1.3.2 to v1.4.0
 title: "Upgrade from v1.3.2 to v1.4.0"
 ---
diff --git a/versioned_docs/version-v1.4/upgrade/v1-4-0-to-v1-4-1.md b/versioned_docs/version-v1.4/upgrade/v1-4-0-to-v1-4-1.md
@@ -0,0 +1,83 @@
+---
+sidebar_position: 2
+sidebar_label: Upgrade from v1.4.0 to v1.4.1
+title: "Upgrade from v1.4.0 to v1.4.1"
+---
+
+<head>
+  <link rel="canonical" href="https://docs.harvesterhci.io/v1.4/upgrade/v1-4-0-to-v1-4-1"/>
+</head>
+
+## General information
+
+An **Upgrade** button appears on the **Dashboard** screen whenever a new Harvester version that you can upgrade to becomes available. For more information, see [Start an upgrade](./automatic.md#start-an-upgrade).
+
+For air-gapped environments, see [Prepare an air-gapped upgrade](./automatic.md#prepare-an-air-gapped-upgrade).
+
+
+## Known issues
+
+---
+
+### 1. Upgrade is stuck in the "Pre-drained" state
+
+The upgrade process may become stuck in the "Pre-drained" state. Kubernetes is supposed to drain the workload on the node, but some factors may cause the process to stall.
+
+![](/img/v1.2/upgrade/known_issues/3730-stuck.png)
+
+A possible cause is processes related to orphan engines of the Longhorn Instance Manager. To determine if this applies to your situation, perform the following steps:
+
+1. Check the name of the `instance-manager` pod on the stuck node.
+
+    Example:
+
+    The stuck node is `harvester-node-1`, and the name of the Instance Manager pod is `instance-manager-d80e13f520e7b952f4b7593fc1883e2a`.
+
+    ```
+    $ kubectl get pods -n longhorn-system --field-selector spec.nodeName=harvester-node-1 | grep instance-manager
+    instance-manager-d80e13f520e7b952f4b7593fc1883e2a          1/1     Running   0              3d8h
+    ```
+
+1. Check the Longhorn Manager logs for informational messages.
+
+    Example:
+
+    ```
+    $ kubectl -n longhorn-system logs daemonsets/longhorn-manager
+    ...
+    time="2025-01-14T00:00:01Z" level=info msg="Node instance-manager-d80e13f520e7b952f4b7593fc1883e2a is marked unschedulable but removing harvester-node-1 PDB is blocked: some volumes are still attached InstanceEngines count 1 pvc-9ae0e9a5-a630-4f0c-98cc-b14893c74f9e-e-0" func="controller.(*InstanceManagerController).syncInstanceManagerPDB" file="instance_manager_controller.go:823" controller=longhorn-instance-manager node=harvester-node-1
+    ```
+
+    The `instance-manager` pod cannot be drained because of the engine `pvc-9ae0e9a5-a630-4f0c-98cc-b14893c74f9e-e-0`.
+
+1. Check if the engine is still running on the stuck node.
+
+    Example:
+
+    ```
+    $ kubectl -n longhorn-system get engines.longhorn.io pvc-9ae0e9a5-a630-4f0c-98cc-b14893c74f9e-e-0 -o jsonpath='{"Current state: "}{.status.currentState}{"\nNode ID: "}{.spec.nodeID}{"\n"}'
+    Current state: stopped
+    Node ID:
+    ```
+
+    The issue likely exists if the output shows that the engine is not running or even the engine is not found.
+
+1. Check if all volumes are healthy.
+
+    ```
+    kubectl get volumes -n longhorn-system -o yaml | yq '.items[] | select(.status.state == "attached")| .status.robustness'
+    ```
+
+    All volumes must be marked `healthy`. If this is not the case, please help to report the issue.
+
+1. Remove the `instance-manager` pod's PodDisruptionBudget (PDB) .
+
+    Example:
+
+    ```
+    kubectl delete pdb instance-manager-d80e13f520e7b952f4b7593fc1883e2a -n longhorn-system
+    ```
+
+Related issues:
+  - [[BUG] v1.4.0 -> v1.4.1-rc1 upgrade stuck in Pre-drained and the node stay in Cordoned](https://github.com/harvester/harvester/issues/7366)
+  - [[IMPROVEMENT] Cleanup orphaned volume runtime resources if the resources already deleted](https://github.com/longhorn/longhorn/issues/6764)