You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/upgrade/v1-4-0-to-v1-4-1.md
+163
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,28 @@ An **Upgrade** button appears on the **Dashboard** screen whenever a new Harvest
14
14
15
15
For air-gapped environments, see [Prepare an air-gapped upgrade](./automatic.md#prepare-an-air-gapped-upgrade).
16
16
17
+
:::info important
18
+
19
+
Check the disk usage of the operating system images on each node before starting the upgrade. To do this, access the node via SSH and run the command `du -sh /run/initramfs/cos-state/cOS/*`.
20
+
21
+
Example:
22
+
23
+
```
24
+
# du -sh /run/initramfs/cos-state/cOS/*
25
+
1.7G /run/initramfs/cos-state/cOS/active.img
26
+
3.1G /run/initramfs/cos-state/cOS/passive.img
27
+
```
28
+
29
+
If `passive.img` (which represents the previously installed Harvester v1.4.0 image) consumes 3.1G of disk space, run the following commands using the root account:
`passive.img` is converted to a sparse file, which should only consume 1.7G of disk space (the same as `active.img`). This ensures that each node has enough free space, preventing the upgrade process from becoming [stuck in the "Waiting Reboot" state](#3-upgrade-is-stuck-in-the-waiting-reboot-state).
37
+
:::
38
+
17
39
18
40
### Update Harvester UI Extension on Rancher v2.10.1
19
41
@@ -144,3 +166,144 @@ You can perform any of the following workarounds:
144
166

145
167
146
168
For more information, see [Issue #7375](https://github.com/harvester/harvester/issues/7375).
169
+
170
+
### 3. Upgrade is stuck in the "Waiting Reboot" state
171
+
172
+
The upgrade process may become stuck in the "Waiting Reboot" state after the Harvester v1.4.1 image is installed on a node and a reboot is initiated. At this point, the upgrade controller observes if the Harvester v1.4.1 operating system is running.
173
+
174
+
If the Harvester v1.4.1 image (hereafter referred to as `active.img`) fails to boot for any reason, the node automatically restarts in fallback mode and boots the previously installed Harvester v1.4.0 image (hereafter referred to as `passive.img`). The upgrade controller is unable to detect the expected operating system, so the upgrade remains stuck until an administrator fixes the problem with `active.img`.
175
+
176
+
`active.img` can become corrupted and unbootable because of insufficient disk space in the COS_STATE partition during the upgrade. This occurs if Harvester v1.4.0 was originally installed on the node and the system was configured to use a separate data disk. The issue does not occur in the following situations:
177
+
178
+
- The system has a single disk that is shared by the operating system and data.
179
+
- An earlier Harvester version was originally installed and then later upgraded to v1.4.0.
180
+
181
+
To check if the issue exists in your environment, perform the following steps:
182
+
183
+
1. Access the node via SSH and log in using the root account.
184
+
185
+
1. Run the commands `cat /proc/cmdline` and `head -n1 /etc/harvester-release.yaml`.
The presence of `cos-img/filename=/cOS/passive.img` and `upgrade_failure` in the output indicates that the system booted into fallback mode. The Harvester version in `/etc/harvester-release.yaml` confirms that the system is currently using the v1.4.0 image.
198
+
199
+
1. Check if `active.img` is corrupted by running the command `fsck.ext2 -nf /run/initramfs/cos-state/cOS/active.img`.
[...a list of various different errors may appear here...]
210
+
211
+
e2fsck: aborted
212
+
213
+
COS_ACTIVE: ********** WARNING: Filesystem still has errors **********
214
+
```
215
+
216
+
1. Check the partition sizes by running the command `lsblk -o NAME,LABEL,SIZE`.
217
+
218
+
Example:
219
+
220
+
```
221
+
# lsblk -o NAME,LABEL,SIZE
222
+
NAME LABEL SIZE
223
+
loop0 COS_ACTIVE 3G
224
+
sr0 1024M
225
+
vda 250G
226
+
├─vda1 COS_GRUB 64M
227
+
├─vda2 COS_OEM 64M
228
+
├─vda3 COS_RECOVERY 4G
229
+
├─vda4 COS_STATE 8G
230
+
└─vda5 COS_PERSISTENT 237.9G
231
+
vdb HARV_LH_DEFAULT 128G
232
+
```
233
+
234
+
The output in the example shows a COS_STATE partition that is 8G in size. In this specific case, which involves an unsuccessful upgrade attempt and a corrupted `active.img`, the partition likely did not have enough free space for the upgrade to succeed.
235
+
236
+
To fix the issue, perform the following steps:
237
+
238
+
1. If your cluster has two or more nodes, access the remaining nodes via SSH and check the disk usage of `active.img` and `passive.img`.
239
+
```
240
+
# du -sh /run/initramfs/cos-state/cOS/*
241
+
1.7G /run/initramfs/cos-state/cOS/active.img
242
+
3.1G /run/initramfs/cos-state/cOS/passive.img
243
+
```
244
+
If `passive.img` consumes 3.1G of disk space, run the following commands using the root account:
`passive.img` is converted to a sparse file, which should only consume 1.7G of disk space (the same as `active.img`). This ensures that the other nodes have enough free space, preventing the upgrade process from becoming stuck again.
251
+
252
+
1. Access the stuck node via SSH, and then run the following commands using the root account:
The existing (clean) `passive.img` is copied over the corrupted `active.img` and the label is set correctly.
261
+
262
+
1. Reboot the stuck node, and then select the first entry ("Harvester v1.4.1") on the GRUB boot screen.
263
+
264
+
The GRUB boot screen initially displays "Harvester v1.4.1 (fallback)" by default. Despite the displayed version, the system boots into Harvester v1.4.0.
265
+
266
+
1. Copy `rootfs.squashfs` from the Harvester v1.4.1 ISO to a convenient location on the stuck node.
267
+
268
+
The ISO can be mounted either on the stuck node or on another system. You can copy the file using the `scp` command.
269
+
270
+
1. Access the stuck node via SSH, and then run the following commands using the root account:
271
+
```
272
+
# mkdir /tmp/manual-os-upgrade
273
+
# mkdir /tmp/manual-os-upgrade/config
274
+
# mkdir /tmp/manual-os-upgrade/rootfs
275
+
# mount -o loop rootfs.squashfs /tmp/manual-os-upgrade/rootfs
You must replace the sample path in the fourth line with the actual path of the copied `rootfs.squashfs`.
290
+
291
+
:::
292
+
293
+
A new (clean) `active.img` is generated based on the root image from the Harvester v1.4.1 ISO.
294
+
295
+
If any errors occur, save a copy of `/tmp/manual-os-upgrade/upgrade.log`.
296
+
297
+
1. Run the following commands:
298
+
```
299
+
# umount /tmp/manual-os-upgrade/rootfs
300
+
# reboot
301
+
```
302
+
The node should boot successfully into Harvester v1.4.1, and the upgrade should proceed as expected.
303
+
304
+
305
+
Related issues:
306
+
- [[BUG] Stuck upgrade from 1.4.0 to 1.4.1](https://github.com/harvester/harvester/issues/7457)
307
+
- [[BUG] discrepancy in default OS partition sizes when using separate data disk](https://github.com/harvester/harvester/issues/7493)
308
+
- [[BUG] after initial installation, passive.img uses 3.1G of disk space, vs. active.img which only uses 1.7G](https://github.com/harvester/harvester/issues/7518)
Copy file name to clipboardexpand all lines: versioned_docs/version-v1.4/upgrade/v1-4-0-to-v1-4-1.md
+163
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,28 @@ An **Upgrade** button appears on the **Dashboard** screen whenever a new Harvest
14
14
15
15
For air-gapped environments, see [Prepare an air-gapped upgrade](./automatic.md#prepare-an-air-gapped-upgrade).
16
16
17
+
:::info important
18
+
19
+
Check the disk usage of the operating system images on each node before starting the upgrade. To do this, access the node via SSH and run the command `du -sh /run/initramfs/cos-state/cOS/*`.
20
+
21
+
Example:
22
+
23
+
```
24
+
# du -sh /run/initramfs/cos-state/cOS/*
25
+
1.7G /run/initramfs/cos-state/cOS/active.img
26
+
3.1G /run/initramfs/cos-state/cOS/passive.img
27
+
```
28
+
29
+
If `passive.img` (which represents the previously installed Harvester v1.4.0 image) consumes 3.1G of disk space, run the following commands using the root account:
`passive.img` is converted to a sparse file, which should only consume 1.7G of disk space (the same as `active.img`). This ensures that each node has enough free space, preventing the upgrade process from becoming [stuck in the "Waiting Reboot" state](#3-upgrade-is-stuck-in-the-waiting-reboot-state).
37
+
:::
38
+
17
39
18
40
### Update Harvester UI Extension on Rancher v2.10.1
19
41
@@ -144,3 +166,144 @@ You can perform any of the following workarounds:
144
166

145
167
146
168
For more information, see [Issue #7375](https://github.com/harvester/harvester/issues/7375).
169
+
170
+
### 3. Upgrade is stuck in the "Waiting Reboot" state
171
+
172
+
The upgrade process may become stuck in the "Waiting Reboot" state after the Harvester v1.4.1 image is installed on a node and a reboot is initiated. At this point, the upgrade controller observes if the Harvester v1.4.1 operating system is running.
173
+
174
+
If the Harvester v1.4.1 image (hereafter referred to as `active.img`) fails to boot for any reason, the node automatically restarts in fallback mode and boots the previously installed Harvester v1.4.0 image (hereafter referred to as `passive.img`). The upgrade controller is unable to detect the expected operating system, so the upgrade remains stuck until an administrator fixes the problem with `active.img`.
175
+
176
+
`active.img` can become corrupted and unbootable because of insufficient disk space in the COS_STATE partition during the upgrade. This occurs if Harvester v1.4.0 was originally installed on the node and the system was configured to use a separate data disk. The issue does not occur in the following situations:
177
+
178
+
- The system has a single disk that is shared by the operating system and data.
179
+
- An earlier Harvester version was originally installed and then later upgraded to v1.4.0.
180
+
181
+
To check if the issue exists in your environment, perform the following steps:
182
+
183
+
1. Access the node via SSH and log in using the root account.
184
+
185
+
1. Run the commands `cat /proc/cmdline` and `head -n1 /etc/harvester-release.yaml`.
The presence of `cos-img/filename=/cOS/passive.img` and `upgrade_failure` in the output indicates that the system booted into fallback mode. The Harvester version in `/etc/harvester-release.yaml` confirms that the system is currently using the v1.4.0 image.
198
+
199
+
1. Check if `active.img` is corrupted by running the command `fsck.ext2 -nf /run/initramfs/cos-state/cOS/active.img`.
[...a list of various different errors may appear here...]
210
+
211
+
e2fsck: aborted
212
+
213
+
COS_ACTIVE: ********** WARNING: Filesystem still has errors **********
214
+
```
215
+
216
+
1. Check the partition sizes by running the command `lsblk -o NAME,LABEL,SIZE`.
217
+
218
+
Example:
219
+
220
+
```
221
+
# lsblk -o NAME,LABEL,SIZE
222
+
NAME LABEL SIZE
223
+
loop0 COS_ACTIVE 3G
224
+
sr0 1024M
225
+
vda 250G
226
+
├─vda1 COS_GRUB 64M
227
+
├─vda2 COS_OEM 64M
228
+
├─vda3 COS_RECOVERY 4G
229
+
├─vda4 COS_STATE 8G
230
+
└─vda5 COS_PERSISTENT 237.9G
231
+
vdb HARV_LH_DEFAULT 128G
232
+
```
233
+
234
+
The output in the example shows a COS_STATE partition that is 8G in size. In this specific case, which involves an unsuccessful upgrade attempt and a corrupted `active.img`, the partition likely did not have enough free space for the upgrade to succeed.
235
+
236
+
To fix the issue, perform the following steps:
237
+
238
+
1. If your cluster has two or more nodes, access the remaining nodes via SSH and check the disk usage of `active.img` and `passive.img`.
239
+
```
240
+
# du -sh /run/initramfs/cos-state/cOS/*
241
+
1.7G /run/initramfs/cos-state/cOS/active.img
242
+
3.1G /run/initramfs/cos-state/cOS/passive.img
243
+
```
244
+
If `passive.img` consumes 3.1G of disk space, run the following commands using the root account:
`passive.img` is converted to a sparse file, which should only consume 1.7G of disk space (the same as `active.img`). This ensures that the other nodes have enough free space, preventing the upgrade process from becoming stuck again.
251
+
252
+
1. Access the stuck node via SSH, and then run the following commands using the root account:
The existing (clean) `passive.img` is copied over the corrupted `active.img` and the label is set correctly.
261
+
262
+
1. Reboot the stuck node, and then select the first entry ("Harvester v1.4.1") on the GRUB boot screen.
263
+
264
+
The GRUB boot screen initially displays "Harvester v1.4.1 (fallback)" by default. Despite the displayed version, the system boots into Harvester v1.4.0.
265
+
266
+
1. Copy `rootfs.squashfs` from the Harvester v1.4.1 ISO to a convenient location on the stuck node.
267
+
268
+
The ISO can be mounted either on the stuck node or on another system. You can copy the file using the `scp` command.
269
+
270
+
1. Access the stuck node via SSH, and then run the following commands using the root account:
271
+
```
272
+
# mkdir /tmp/manual-os-upgrade
273
+
# mkdir /tmp/manual-os-upgrade/config
274
+
# mkdir /tmp/manual-os-upgrade/rootfs
275
+
# mount -o loop rootfs.squashfs /tmp/manual-os-upgrade/rootfs
You must replace the sample path in the fourth line with the actual path of the copied `rootfs.squashfs`.
290
+
291
+
:::
292
+
293
+
A new (clean) `active.img` is generated based on the root image from the Harvester v1.4.1 ISO.
294
+
295
+
If any errors occur, save a copy of `/tmp/manual-os-upgrade/upgrade.log`.
296
+
297
+
1. Run the following commands:
298
+
```
299
+
# umount /tmp/manual-os-upgrade/rootfs
300
+
# reboot
301
+
```
302
+
The node should boot successfully into Harvester v1.4.1, and the upgrade should proceed as expected.
303
+
304
+
305
+
Related issues:
306
+
- [[BUG] Stuck upgrade from 1.4.0 to 1.4.1](https://github.com/harvester/harvester/issues/7457)
307
+
- [[BUG] discrepancy in default OS partition sizes when using separate data disk](https://github.com/harvester/harvester/issues/7493)
308
+
- [[BUG] after initial installation, passive.img uses 3.1G of disk space, vs. active.img which only uses 1.7G](https://github.com/harvester/harvester/issues/7518)
0 commit comments