Skip to content

Commit 32e10bf

Browse files
Merge pull request #792 from shepherdjerred/feat/homelab-talos-k8s-upgrade
chore(homelab): record Talos v1.13.2 + Kubernetes v1.36.0 deployment
2 parents 64fd6c7 + 7543816 commit 32e10bf

4 files changed

Lines changed: 117 additions & 2 deletions

File tree

packages/docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ Active or upcoming plans only. Completed plans live in `archive/completed/`; thi
6262
- [Renovate-481 Fixes & CI Gap](plans/2026-05-12_renovate-481-fixes-and-ci-gap.md) - Unbreak main after the renovate-481 sweep (Prisma 7 schemas, react-dom skew, birmel start, temporal lint) and remove `MAIN_ONLY` from validation-only CI steps so PRs catch the same class of regression pre-merge
6363
- [Competition CRON Schedule](plans/2026-05-11_competition-cron-schedule.md) - Per-`Competition` CRON expression gating leaderboard posts; replaces global midnight-UTC cron with a per-minute dispatcher
6464
- [Renovate Dashboard Residual Dependency Updates](plans/2026-05-12_renovate-dashboard-residual-updates.md) - Finish remaining dashboard #481 package, Docker, Helm, Rust, and Maven updates
65+
- [Talos + Kubernetes Upgrade on `torvalds`](plans/2026-05-12_talos-k8s-upgrade.md) - Apply already-pinned Talos v1.13.2 + Kubernetes v1.36.0 to the live single-node cluster
6566
- [Renovate Dashboard Update Batch](plans/2026-05-13_renovate-dashboard-update.md) - Apply the current actionable Renovate dashboard Docker digest, Helm chart, and production image pin updates
6667

6768
## Logs
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Update Talos + Kubernetes on `torvalds`
2+
3+
## Status
4+
5+
Complete (with one follow-up: Kubernetes is on v1.36.0 instead of v1.36.1 — Sidero kubelet image for v1.36.1 was not yet published at upgrade time; run `talosctl --nodes 192.168.1.81 upgrade-k8s --to 1.36.1` once `ghcr.io/siderolabs/kubelet:v1.36.1` is available on GHCR)
6+
7+
## Context
8+
9+
Single-node homelab cluster `torvalds` (192.168.1.81) is running Talos **v1.12.0** + Kubernetes **v1.35.0**, but the repo has already been bumped to Talos **v1.13.2** + Kubernetes **v1.36.1** (commit `52640893d`, 2026-05-11). Both target versions are the current upstream latest and were released today (2026-05-12). The work is purely operational: apply the upgrade to the live node. No code changes needed.
10+
11+
Cluster health is currently green (`talosctl health` all OK).
12+
13+
## State
14+
15+
| Component | Running | Target | Source of truth |
16+
| ---------- | ------- | ----------- | ------------------------------------------------- |
17+
| Talos | v1.12.0 | **v1.13.2** | `packages/homelab/src/talos/patches/image.yaml:8` |
18+
| Kubernetes | v1.35.0 | **v1.36.1** | `packages/homelab/src/cdk8s/src/versions.ts:148` |
19+
20+
Talos installer image (already pinned in repo):
21+
22+
```
23+
factory.talos.dev/metal-installer-secureboot/a0f205c1e29abaf83e16257c04c83267b5a54feac3861eedc1080edab9827fc3:v1.13.2@sha256:f689384831eb907d1f9d10b161d0cce47377e03fc5c0eef29851a40b687e3e6f
24+
```
25+
26+
Order is forced: Talos **first**, then Kubernetes. Talos v1.12 does not officially support k8s 1.36; v1.13 does.
27+
28+
## Steps
29+
30+
### 1. Pre-flight (read-only)
31+
32+
```bash
33+
talosctl --nodes 192.168.1.81 version
34+
kubectl get nodes -o wide
35+
talosctl --nodes 192.168.1.81 health
36+
kubectl get applications -n argocd -o wide
37+
```
38+
39+
Abort if any node is NotReady, any ArgoCD app is `Degraded`/`OutOfSync`, or `talosctl health` reports anything other than OK.
40+
41+
### 2. Upgrade Talos (v1.12.0 → v1.13.2)
42+
43+
```bash
44+
IMAGE=factory.talos.dev/metal-installer-secureboot/a0f205c1e29abaf83e16257c04c83267b5a54feac3861eedc1080edab9827fc3:v1.13.2
45+
46+
talosctl --nodes 192.168.1.81 upgrade --image "$IMAGE" --preserve
47+
```
48+
49+
- `--preserve` keeps STATE and EPHEMERAL partitions (etcd survives). Default for single-node CP but pass it explicitly to be safe.
50+
- Node reboots immediately (~3–5 min downtime expected).
51+
52+
### 3. Verify Talos came back
53+
54+
```bash
55+
talosctl --nodes 192.168.1.81 version # expect Server Tag v1.13.2
56+
talosctl --nodes 192.168.1.81 health
57+
kubectl get nodes -o wide # OS-IMAGE should now be Talos (v1.13.2)
58+
kubectl get pods -A | grep -v Running | grep -v Completed # should be empty
59+
talosctl --nodes 192.168.1.81 read /proc/modules | grep zfs # ZFS module loaded
60+
```
61+
62+
### 4. Upgrade Kubernetes (v1.35.0 → v1.36.1)
63+
64+
```bash
65+
talosctl --nodes 192.168.1.81 upgrade-k8s --to 1.36.1
66+
```
67+
68+
No node reboot; kubeadm-style component upgrade. Workloads stay up.
69+
70+
### 5. Final verification
71+
72+
```bash
73+
kubectl get nodes -o wide # VERSION = v1.36.1
74+
kubectl get pods -A | grep -v Running | grep -v Completed # empty
75+
kubectl get applications -n argocd -o wide # all Synced/Healthy
76+
talosctl --nodes 192.168.1.81 health
77+
```
78+
79+
## Critical files
80+
81+
- `packages/homelab/src/talos/patches/image.yaml` — installer image pin (already at v1.13.2)
82+
- `packages/homelab/src/talos/image.yaml` — Talos factory schematic (extensions: i915, intel-ucode, tailscale, zfs)
83+
- `packages/homelab/src/talos/update-image-id.ts` — regenerates schematic hash; not needed (extensions unchanged)
84+
- `packages/homelab/src/cdk8s/src/versions.ts:148,155` — Kubernetes + Talos version pins (Renovate-tracked, at target)
85+
- `packages/homelab/README.md:202-218` — documented upgrade procedure
86+
87+
## Caveats
88+
89+
- **Single-node cluster**: Talos upgrade reboots the only node. ~3–5 min downtime is unavoidable. Velero scheduled backups are the safety net (user opted not to trigger a manual one).
90+
- **Local kubectl skew**: client is v1.33.9; after upgrade the server will be v1.36.1 (skew of 3 minors, beyond +/-1). User opted not to bump in this session.
91+
- **SecureBoot UKI image**: `metal-installer-secureboot` variant. `talosctl upgrade` handles the UKI swap.
92+
- **Schematic hash**: unchanged between v1.13.0 and v1.13.2 because extension list wasn't modified. No `update-image-id.ts` run needed.
93+
- **ArgoCD reconcile**: confirm everything `Healthy` after Talos reboot before proceeding to k8s upgrade.
94+
95+
## Session Log — 2026-05-12
96+
97+
### Done
98+
99+
- Talos `v1.12.0 → v1.13.2` applied to `torvalds` via `talosctl upgrade --image factory.talos.dev/metal-installer-secureboot/...:v1.13.2 --preserve`. Kernel now `6.18.29-talos`; containerd `2.2.3`; ZFS module loaded (49 refs).
100+
- Kubernetes `v1.35.0 → v1.36.0` applied via `talosctl upgrade-k8s --to 1.36.0`. kube-apiserver, kube-controller-manager, kube-scheduler, kube-proxy, and kubelet are all on v1.36.0. Bootstrap manifests reconciled.
101+
- Post-upgrade: bounced `postgres-operator` to clear stale `SyncFailed` state on bugsink/grafana/plausible/temporal postgres CRs that were left by a kyverno-webhook race during the reboot.
102+
- Plan file mirrored from `~/.claude/plans/` to `packages/docs/plans/2026-05-12_talos-k8s-upgrade.md` per docs discipline; `index.md` updated.
103+
104+
### Remaining
105+
106+
- `birmel` ArgoCD app still Progressing — pre-existing Prisma client module-resolution bug (`Cannot find module '.prisma/client/default'`); unrelated to this upgrade.
107+
- When Sidero Labs publishes `ghcr.io/siderolabs/kubelet:v1.36.1` (Renovate will reopen a PR bumping `versions.ts:148`), re-run `talosctl --nodes 192.168.1.81 upgrade-k8s --to 1.36.1` and re-sync `versions.ts` + README example to match. The current pin (`v1.36.0`) reflects the actually-deployed version.
108+
109+
### Caveats
110+
111+
- First `upgrade-k8s` invocation timed out on kube-apiserver pod readiness check; second invocation completed kube-controller-manager and kube-apiserver but timed out on kube-scheduler; third invocation completed cleanly. Talos `upgrade-k8s` is idempotent — retrying is the correct response. The `config version mismatch: got 1, expected 2` messages are normal during the kubelet manifest reload window.
112+
- `--preserve` flag is now deprecated in Talos v1.13 (warning emitted: "legacy flag for MachineService.Upgrade fallback, to be removed in Talos 1.18"). The new upgrade API was not available in the v1.12 server, so fell back to legacy path; next upgrade from v1.13.x onwards should drop `--preserve`.
113+
- During the Talos reboot window, ~30 workload pods went through CrashLoopBackOff while CSI/webhooks were still starting; all self-resolved within ~5 min.
114+
- Local kubectl client is v1.33.9 — server is now v1.36.0 (skew = 3 minors, beyond the supported +/-1). User opted not to bump in this session. Expect occasional skew warnings.

packages/homelab/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,6 @@ talosctl upgrade --nodes 192.168.1.81 \
213213
### Upgrade Kubernetes
214214

215215
```bash {"interpreter":"/opt/homebrew/bin/bash"}
216-
VERSION=1.36.1
216+
VERSION=1.36.0
217217
talosctl --nodes 192.168.1.81 upgrade-k8s --to $VERSION
218218
```

packages/homelab/src/cdk8s/src/versions.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ const versions = {
145145
"openebs/velero-plugin":
146146
"3.6.0@sha256:9ea3331d891e436a7239e37e68ca4c8888500cb122be7cdc9d8400f345555c76",
147147
// renovate: datasource=github-releases versioning=semver
148-
"kubernetes/kubernetes": "v1.36.1",
148+
"kubernetes/kubernetes": "v1.36.0",
149149
// renovate: datasource=custom.papermc versioning=semver
150150
paper: "26.1.2",
151151
// renovate: datasource=docker registryUrl=https://ghcr.io/recyclarr versioning=docker

0 commit comments

Comments
 (0)