Skip to content

Commit ea8b885

Browse files
committed
up
1 parent 401de9e commit ea8b885

4 files changed

Lines changed: 185 additions & 1 deletion

File tree

README.md

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ A GitOps-driven Kubernetes cluster using **Talos OS** (secure, immutable Linux f
2020
- [MinIO S3 Backup Configuration](#-minio-s3-backup-configuration)
2121
- [Documentation](#-documentation)
2222
- [Troubleshooting](#-troubleshooting)
23+
- [Upgrade](#upgrade)
2324

2425
## 📋 Prerequisites
2526

@@ -438,4 +439,33 @@ The patterns and structure remain the same - this is **production-grade GitOps**
438439

439440
## 📜 License
440441

441-
MIT License - See [LICENSE](LICENSE) for details
442+
MIT License - See [LICENSE](LICENSE) for details
443+
444+
## Upgrade
445+
446+
This repo includes a guided, repeatable process to upgrade Longhorn to v1.10.x safely.
447+
448+
- Read the runbook: `docs/runbooks/longhorn-1.10-upgrade.md`
449+
- Key steps:
450+
- Normalize CRD conversion spec (older installs may leave webhook fields)
451+
- Migrate all Longhorn CRDs to stored version `v1beta2` (mandatory for v1.10)
452+
- Sync the Longhorn Helm release via ArgoCD and validate
453+
454+
Quick commands from repo root:
455+
456+
```bash
457+
# 1) Fix legacy CRD conversion blocks atomically
458+
./scripts/longhorn-fix-crd-conversion.sh
459+
460+
# 2) Migrate CRD storedVersions to v1beta2 (safe to re-run)
461+
./scripts/longhorn-v110-crd-migration.sh
462+
463+
# 3) Verify only v1beta2 is present
464+
kubectl get crd -l app.kubernetes.io/name=longhorn -o=jsonpath='{range .items[*]}{.metadata.name}{": "}{.status.storedVersions}{"\n"}{end}'
465+
466+
# 4) Re-sync Longhorn in ArgoCD and verify pods in longhorn-system
467+
```
468+
469+
Notes:
470+
- The chart is pinned in `infrastructure/storage/longhorn/kustomization.yaml` and values in `infrastructure/storage/longhorn/values.yaml`.
471+
- We avoid per-engine JSON booleans in values to sidestep a known 1.10.0 parsing issue; revisit when broadly enabling the V2 data engine.
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Longhorn 1.10 Upgrade Runbook
2+
3+
This runbook captures the exact prechecks and commands required to safely upgrade Longhorn to v1.10.x in this cluster.
4+
5+
Important highlights from the v1.10.0 release notes:
6+
- Kubernetes must be >= 1.25
7+
- The Longhorn v1beta1 API was removed. If your cluster ever stored Longhorn CRs in v1beta1 (likely if you originally installed < v1.3.0), you MUST migrate CR storage to v1beta2 before upgrading.
8+
- Some defaultSettings support per-engine JSON. A known 1.10.0 bug can affect boolean DataEngineSpecific values when sourced via Helm values. We keep these as simple scalars unless we actively use V2.
9+
10+
## Pre-checks
11+
- Ensure Kubernetes >= 1.25 across the cluster.
12+
- Confirm ArgoCD will install CRDs during Helm upgrade (kustomization includesCRDs: true).
13+
- Optionally snapshot/backup critical workloads.
14+
15+
## Mandatory: CRD storage version migration (before upgrading)
16+
Before the migration, fix legacy CRD conversion blocks that can break CRD applies during upgrade.
17+
18+
1) Fix CRD conversion blocks (older installs sometimes leave webhookClientConfig while strategy isn't Webhook):
19+
20+
```
21+
./scripts/longhorn-fix-crd-conversion.sh
22+
```
23+
24+
2) Run the helper script to migrate any Longhorn CRDs that still have v1beta1 storedVersions to v1beta2.
25+
26+
Steps (requires kubectl + jq):
27+
1) Pause Longhorn syncs in ArgoCD (optional but recommended during migration window).
28+
2) Run the script:
29+
30+
```
31+
./scripts/longhorn-v110-crd-migration.sh
32+
```
33+
34+
3) Verify all Longhorn CRDs show only v1beta2 in storedVersions:
35+
36+
```
37+
kubectl get crd -l app.kubernetes.io/name=longhorn -o=jsonpath='{range .items[*]}{.metadata.name}{": "}{.status.storedVersions}{"\n"}{end}'
38+
```
39+
Expected: every line shows ["v1beta2"]. If any show v1beta1, re-run the script or investigate.
40+
41+
## Upgrade via ArgoCD
42+
- Chart version is pinned to 1.10.0 in `infrastructure/storage/longhorn/kustomization.yaml`.
43+
- Values are managed in `infrastructure/storage/longhorn/values.yaml`.
44+
- Pre-upgrade checker job is disabled to avoid GitOps drift (`preUpgradeChecker.jobEnabled: false`).
45+
46+
Sync the Longhorn app in ArgoCD. Wait for all pods in `longhorn-system` to become Ready.
47+
48+
## Post-upgrade checks
49+
- Pods healthy:
50+
- longhorn-manager, longhorn-ui, longhorn-csi-plugin, csi-* sidecars, instance-manager, engine-image, share-manager
51+
- Longhorn UI reachable (via existing Gateway/HTTPRoute)
52+
- Create a test PVC, attach to a test pod, write small data, and verify persistence.
53+
- Recurring jobs present (from `recurring-jobs.yaml`).
54+
- Backup target is detected (S3/MinIO) and can list/create a small backup.
55+
56+
## Rollback guidance (only if upgrade fails early)
57+
If you skipped the migration and upgraded, managers may fail with CRD storedVersions errors. Follow the v1.10 release notes to temporarily patch the webhook and downgrade to the exact previous 1.9.x, then perform the migration script above and retry the upgrade.
58+
59+
Reference:
60+
- Release notes: https://github.com/longhorn/longhorn/releases/tag/v1.10.0
61+
- Install with Helm Controller (context for K3s/RKE2): https://longhorn.io/docs/1.10.0/deploy/install/install-with-helm-controller/
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
#!/usr/bin/env bash
2+
set -euo pipefail
3+
4+
# Fix legacy Longhorn CRD conversion fields left from older installs
5+
# Error addressed:
6+
# spec.conversion.strategy: Required value
7+
# spec.conversion.webhookClientConfig: Forbidden when strategy != Webhook
8+
# Strategy: remove deprecated spec.conversion.webhookClientConfig and ensure strategy: None
9+
10+
require() {
11+
command -v "$1" >/dev/null 2>&1 || { echo "FATAL: missing dependency: $1"; exit 1; }
12+
}
13+
14+
require kubectl
15+
16+
# Target only Longhorn CRDs (compatible with macOS bash 3.2)
17+
CRDS="$(kubectl get crd -l app.kubernetes.io/name=longhorn -o name || true)"
18+
19+
if [ -z "${CRDS}" ]; then
20+
echo "No Longhorn CRDs found. Nothing to do."
21+
exit 0
22+
fi
23+
24+
echo "Fixing CRD conversion blocks for: ${CRDS}"
25+
for crd in ${CRDS}; do
26+
name=${crd#*/}
27+
echo "- ${name}: setting conversion.strategy=None and removing legacy fields in one patch"
28+
kubectl patch "${crd}" --type=merge -p='{
29+
"spec": {
30+
"conversion": {
31+
"strategy": "None",
32+
"webhookClientConfig": null,
33+
"conversionReviewVersions": null,
34+
"webhook": null
35+
}
36+
}
37+
}'
38+
39+
done
40+
41+
echo "Done. Verify with: kubectl get crd -o jsonpath='{range .items[?(@.metadata.labels."app.kubernetes.io/name"=="longhorn")]}{.metadata.name}{": "}{.spec.conversion.strategy}{"\n"}{end}'"
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
#!/usr/bin/env bash
2+
set -euo pipefail
3+
4+
# Longhorn v1.10 pre-upgrade CRD storage migration helper
5+
# This script follows the release notes instructions to ensure all storedVersions are v1beta2.
6+
7+
NS=${NS:-longhorn-system}
8+
WEBHOOK=longhorn-webhook-validator
9+
10+
require() {
11+
command -v "$1" >/dev/null 2>&1 || { echo "FATAL: missing dependency: $1"; exit 1; }
12+
}
13+
14+
require kubectl
15+
require jq
16+
17+
echo "Temporarily disabling Longhorn settings UPDATE validation in webhook..."
18+
kubectl patch validatingwebhookconfiguration ${WEBHOOK} \
19+
--type=merge \
20+
-p "$(kubectl get validatingwebhookconfiguration ${WEBHOOK} -o json | \
21+
jq '.webhooks[0].rules |= map(if .apiGroups == ["longhorn.io"] and .resources == ["settings"] then .operations |= map(select(. != "UPDATE")) else . end)')"
22+
23+
migration_time="$(date +%Y-%m-%dT%H:%M:%S)"
24+
echo "Finding Longhorn CRDs with stored v1beta1 resources..."
25+
crds="$(kubectl get crd -l app.kubernetes.io/name=longhorn -o json | jq -r '.items[] | select(.status.storedVersions | index("v1beta1")) | .metadata.name')"
26+
27+
if [ -z "${crds}" ]; then
28+
echo "No CRDs report v1beta1 in storedVersions. Skipping migration."
29+
else
30+
echo "CRDs to migrate: ${crds}"
31+
for crd in ${crds}; do
32+
echo "Migrating ${crd} ..."
33+
names="$(kubectl -n "${NS}" get "${crd}" -o jsonpath='{.items[*].metadata.name}' || true)"
34+
for name in ${names}; do
35+
echo " Patching ${crd}/${name} with migration-time annotation"
36+
kubectl patch "${crd}" "${name}" -n "${NS}" --type=merge -p='{"metadata":{"annotations":{"migration-time":"'"${migration_time}"'"}}}' || true
37+
done
38+
echo " Cleaning up storedVersions to [\"v1beta2\"] for CRD ${crd}"
39+
kubectl patch crd "${crd}" --type=merge -p '{"status":{"storedVersions":["v1beta2"]}}' --subresource=status
40+
done
41+
fi
42+
43+
echo "Re-enabling Longhorn settings UPDATE validation in webhook..."
44+
kubectl patch validatingwebhookconfiguration ${WEBHOOK} \
45+
--type=merge \
46+
-p "$(kubectl get validatingwebhookconfiguration ${WEBHOOK} -o json | \
47+
jq '.webhooks[0].rules |= map(if .apiGroups == ["longhorn.io"] and .resources == ["settings"] then .operations |= (. + ["UPDATE"] | unique) else . end)')"
48+
49+
echo "Verifying storedVersions for Longhorn CRDs..."
50+
kubectl get crd -l app.kubernetes.io/name=longhorn -o=jsonpath='{range .items[*]}{.metadata.name}{": "}{.status.storedVersions}{"\n"}{end}'
51+
52+
echo "Done. Ensure all entries list only [\"v1beta2\"]. If not, investigate and retry."

0 commit comments

Comments
 (0)