Skip to content

Commit db9a109

Browse files
fix: add note for drain stuck when upgrade from v1.4.0 to v1.4.1 (#707)
Co-authored-by: Jillian <[email protected]>
1 parent 84aad9b commit db9a109

14 files changed

+178
-12
lines changed

docs/upgrade/v1-1-2-to-v1-2-0.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 7
2+
sidebar_position: 8
33
sidebar_label: Upgrade from v1.1.2 to v1.2.0 (not recommended)
44
title: "Upgrade from v1.1.2 to v1.2.0 (not recommended)"
55
---

docs/upgrade/v1-2-0-to-v1-2-1.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 6
2+
sidebar_position: 7
33
sidebar_label: Upgrade from v1.1.2/v1.1.3/v1.2.0 to v1.2.1
44
title: "Upgrade from v1.1.2/v1.1.3/v1.2.0 to v1.2.1"
55
---

docs/upgrade/v1-2-1-to-v1-2-2.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 5
2+
sidebar_position: 6
33
sidebar_label: Upgrade from v1.2.1 to v1.2.2
44
title: "Upgrade from v1.2.1 to v1.2.2"
55
---

docs/upgrade/v1-2-2-to-v1-3-1.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 4
2+
sidebar_position: 5
33
sidebar_label: Upgrade from v1.2.2/v1.3.0 to v1.3.1
44
title: "Upgrade from v1.2.2/v1.3.0 to v1.3.1"
55
---

docs/upgrade/v1-3-1-to-v1-3-2.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 3
2+
sidebar_position: 4
33
sidebar_label: Upgrade from v1.3.1 to v1.3.2
44
title: "Upgrade from v1.3.1 to v1.3.2"
55
---

docs/upgrade/v1-3-2-to-v1-4-0.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 2
2+
sidebar_position: 3
33
sidebar_label: Upgrade from v1.3.2 to v1.4.0
44
title: "Upgrade from v1.3.2 to v1.4.0"
55
---

docs/upgrade/v1-4-0-to-v1-4-1.md

+83
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
---
2+
sidebar_position: 2
3+
sidebar_label: Upgrade from v1.4.0 to v1.4.1
4+
title: "Upgrade from v1.4.0 to v1.4.1"
5+
---
6+
7+
<head>
8+
<link rel="canonical" href="https://docs.harvesterhci.io/v1.4/upgrade/v1-4-0-to-v1-4-1"/>
9+
</head>
10+
11+
## General information
12+
13+
An **Upgrade** button appears on the **Dashboard** screen whenever a new Harvester version that you can upgrade to becomes available. For more information, see [Start an upgrade](./automatic.md#start-an-upgrade).
14+
15+
For air-gapped environments, see [Prepare an air-gapped upgrade](./automatic.md#prepare-an-air-gapped-upgrade).
16+
17+
18+
## Known issues
19+
20+
---
21+
22+
### 1. Upgrade is stuck in the "Pre-drained" state
23+
24+
The upgrade process may become stuck in the "Pre-drained" state. Kubernetes is supposed to drain the workload on the node, but some factors may cause the process to stall.
25+
26+
![](/img/v1.2/upgrade/known_issues/3730-stuck.png)
27+
28+
A possible cause is processes related to orphan engines of the Longhorn Instance Manager. To determine if this applies to your situation, perform the following steps:
29+
30+
1. Check the name of the `instance-manager` pod on the stuck node.
31+
32+
Example:
33+
34+
The stuck node is `harvester-node-1`, and the name of the Instance Manager pod is `instance-manager-d80e13f520e7b952f4b7593fc1883e2a`.
35+
36+
```
37+
$ kubectl get pods -n longhorn-system --field-selector spec.nodeName=harvester-node-1 | grep instance-manager
38+
instance-manager-d80e13f520e7b952f4b7593fc1883e2a 1/1 Running 0 3d8h
39+
```
40+
41+
1. Check the Longhorn Manager logs for informational messages.
42+
43+
Example:
44+
45+
```
46+
$ kubectl -n longhorn-system logs daemonsets/longhorn-manager
47+
...
48+
time="2025-01-14T00:00:01Z" level=info msg="Node instance-manager-d80e13f520e7b952f4b7593fc1883e2a is marked unschedulable but removing harvester-node-1 PDB is blocked: some volumes are still attached InstanceEngines count 1 pvc-9ae0e9a5-a630-4f0c-98cc-b14893c74f9e-e-0" func="controller.(*InstanceManagerController).syncInstanceManagerPDB" file="instance_manager_controller.go:823" controller=longhorn-instance-manager node=harvester-node-1
49+
```
50+
51+
The `instance-manager` pod cannot be drained because of the engine `pvc-9ae0e9a5-a630-4f0c-98cc-b14893c74f9e-e-0`.
52+
53+
1. Check if the engine is still running on the stuck node.
54+
55+
Example:
56+
57+
```
58+
$ kubectl -n longhorn-system get engines.longhorn.io pvc-9ae0e9a5-a630-4f0c-98cc-b14893c74f9e-e-0 -o jsonpath='{"Current state: "}{.status.currentState}{"\nNode ID: "}{.spec.nodeID}{"\n"}'
59+
Current state: stopped
60+
Node ID:
61+
```
62+
63+
The issue likely exists if the output shows that the engine is not running or even the engine is not found.
64+
65+
1. Check if all volumes are healthy.
66+
67+
```
68+
kubectl get volumes -n longhorn-system -o yaml | yq '.items[] | select(.status.state == "attached")| .status.robustness'
69+
```
70+
71+
All volumes must be marked `healthy`. If this is not the case, please help to report the issue.
72+
73+
1. Remove the `instance-manager` pod's PodDisruptionBudget (PDB) .
74+
75+
Example:
76+
77+
```
78+
kubectl delete pdb instance-manager-d80e13f520e7b952f4b7593fc1883e2a -n longhorn-system
79+
```
80+
81+
Related issues:
82+
- [[BUG] v1.4.0 -> v1.4.1-rc1 upgrade stuck in Pre-drained and the node stay in Cordoned](https://github.com/harvester/harvester/issues/7366)
83+
- [[IMPROVEMENT] Cleanup orphaned volume runtime resources if the resources already deleted](https://github.com/longhorn/longhorn/issues/6764)

versioned_docs/version-v1.4/upgrade/v1-1-2-to-v1-2-0.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 7
2+
sidebar_position: 8
33
sidebar_label: Upgrade from v1.1.2 to v1.2.0 (not recommended)
44
title: "Upgrade from v1.1.2 to v1.2.0 (not recommended)"
55
---

versioned_docs/version-v1.4/upgrade/v1-2-0-to-v1-2-1.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 6
2+
sidebar_position: 7
33
sidebar_label: Upgrade from v1.1.2/v1.1.3/v1.2.0 to v1.2.1
44
title: "Upgrade from v1.1.2/v1.1.3/v1.2.0 to v1.2.1"
55
---

versioned_docs/version-v1.4/upgrade/v1-2-1-to-v1-2-2.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 5
2+
sidebar_position: 6
33
sidebar_label: Upgrade from v1.2.1 to v1.2.2
44
title: "Upgrade from v1.2.1 to v1.2.2"
55
---

versioned_docs/version-v1.4/upgrade/v1-2-2-to-v1-3-1.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 4
2+
sidebar_position: 5
33
sidebar_label: Upgrade from v1.2.2/v1.3.0 to v1.3.1
44
title: "Upgrade from v1.2.2/v1.3.0 to v1.3.1"
55
---

versioned_docs/version-v1.4/upgrade/v1-3-1-to-v1-3-2.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 3
2+
sidebar_position: 4
33
sidebar_label: Upgrade from v1.3.1 to v1.3.2
44
title: "Upgrade from v1.3.1 to v1.3.2"
55
---

versioned_docs/version-v1.4/upgrade/v1-3-2-to-v1-4-0.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 2
2+
sidebar_position: 3
33
sidebar_label: Upgrade from v1.3.2 to v1.4.0
44
title: "Upgrade from v1.3.2 to v1.4.0"
55
---
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
---
2+
sidebar_position: 2
3+
sidebar_label: Upgrade from v1.4.0 to v1.4.1
4+
title: "Upgrade from v1.4.0 to v1.4.1"
5+
---
6+
7+
<head>
8+
<link rel="canonical" href="https://docs.harvesterhci.io/v1.4/upgrade/v1-4-0-to-v1-4-1"/>
9+
</head>
10+
11+
## General information
12+
13+
An **Upgrade** button appears on the **Dashboard** screen whenever a new Harvester version that you can upgrade to becomes available. For more information, see [Start an upgrade](./automatic.md#start-an-upgrade).
14+
15+
For air-gapped environments, see [Prepare an air-gapped upgrade](./automatic.md#prepare-an-air-gapped-upgrade).
16+
17+
18+
## Known issues
19+
20+
---
21+
22+
### 1. Upgrade is stuck in the "Pre-drained" state
23+
24+
The upgrade process may become stuck in the "Pre-drained" state. Kubernetes is supposed to drain the workload on the node, but some factors may cause the process to stall.
25+
26+
![](/img/v1.2/upgrade/known_issues/3730-stuck.png)
27+
28+
A possible cause is processes related to orphan engines of the Longhorn Instance Manager. To determine if this applies to your situation, perform the following steps:
29+
30+
1. Check the name of the `instance-manager` pod on the stuck node.
31+
32+
Example:
33+
34+
The stuck node is `harvester-node-1`, and the name of the Instance Manager pod is `instance-manager-d80e13f520e7b952f4b7593fc1883e2a`.
35+
36+
```
37+
$ kubectl get pods -n longhorn-system --field-selector spec.nodeName=harvester-node-1 | grep instance-manager
38+
instance-manager-d80e13f520e7b952f4b7593fc1883e2a 1/1 Running 0 3d8h
39+
```
40+
41+
1. Check the Longhorn Manager logs for informational messages.
42+
43+
Example:
44+
45+
```
46+
$ kubectl -n longhorn-system logs daemonsets/longhorn-manager
47+
...
48+
time="2025-01-14T00:00:01Z" level=info msg="Node instance-manager-d80e13f520e7b952f4b7593fc1883e2a is marked unschedulable but removing harvester-node-1 PDB is blocked: some volumes are still attached InstanceEngines count 1 pvc-9ae0e9a5-a630-4f0c-98cc-b14893c74f9e-e-0" func="controller.(*InstanceManagerController).syncInstanceManagerPDB" file="instance_manager_controller.go:823" controller=longhorn-instance-manager node=harvester-node-1
49+
```
50+
51+
The `instance-manager` pod cannot be drained because of the engine `pvc-9ae0e9a5-a630-4f0c-98cc-b14893c74f9e-e-0`.
52+
53+
1. Check if the engine is still running on the stuck node.
54+
55+
Example:
56+
57+
```
58+
$ kubectl -n longhorn-system get engines.longhorn.io pvc-9ae0e9a5-a630-4f0c-98cc-b14893c74f9e-e-0 -o jsonpath='{"Current state: "}{.status.currentState}{"\nNode ID: "}{.spec.nodeID}{"\n"}'
59+
Current state: stopped
60+
Node ID:
61+
```
62+
63+
The issue likely exists if the output shows that the engine is not running or even the engine is not found.
64+
65+
1. Check if all volumes are healthy.
66+
67+
```
68+
kubectl get volumes -n longhorn-system -o yaml | yq '.items[] | select(.status.state == "attached")| .status.robustness'
69+
```
70+
71+
All volumes must be marked `healthy`. If this is not the case, please help to report the issue.
72+
73+
1. Remove the `instance-manager` pod's PodDisruptionBudget (PDB) .
74+
75+
Example:
76+
77+
```
78+
kubectl delete pdb instance-manager-d80e13f520e7b952f4b7593fc1883e2a -n longhorn-system
79+
```
80+
81+
Related issues:
82+
- [[BUG] v1.4.0 -> v1.4.1-rc1 upgrade stuck in Pre-drained and the node stay in Cordoned](https://github.com/harvester/harvester/issues/7366)
83+
- [[IMPROVEMENT] Cleanup orphaned volume runtime resources if the resources already deleted](https://github.com/longhorn/longhorn/issues/6764)

0 commit comments

Comments
 (0)