You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make Kyverno a standalone ArgoCD Application (sync-wave 3) so its webhooks register before any app PVCs are created; add infrastructure/controllers/argocd/apps/kyverno-app.yaml, remove Kyverno from the Infrastructure AppSet, and register kyverno-app.yaml in the kustomization manifest. Add preflight checks to scripts/bootstrap-argocd.sh to verify the cilium CLI and expected Cilium version (1.19.0), warn/prompt on mismatches, and document repair steps for Hubble Relay cert issues. Update documentation: README.md clarifies that the cilium install CLI version must match the Helm chart and includes Hubble cert cleanup steps; CLAUDE.md updates sync-wave ordering and rationale (PVC Plumber → Kyverno → Infrastructure AppSet). Also expand .claude local settings to allow additional kubectl bash commands used during bootstrap.
- Longhorn won't deploy until Cilium + External Secrets are healthy
126
-
- PVC Plumber (Wave 2) must run before Infrastructure AppSet (Wave 4) because Kyverno policies call PVC Plumber API
127
+
- PVC Plumber (Wave 2) must run before Kyverno (Wave 3) because Kyverno policies call PVC Plumber API
128
+
- Kyverno (Wave 3) is a **standalone Application** (not in the Infrastructure AppSet) to guarantee its webhooks are registered before any app PVCs are created. ApplicationSets are considered "healthy" immediately upon creation, so putting Kyverno in an AppSet would race with app deployment.
127
129
-**FAIL-CLOSED**: If PVC Plumber is down, Kyverno denies creation of backup-labeled PVCs. Apps retry via ArgoCD backoff until Plumber is healthy. This prevents data loss during disaster recovery.
128
-
-Kyverno, cert-manager, GPU operators etc. deploy via Infrastructure AppSet (Wave 4) before user apps (Wave 6)
130
+
- cert-manager, GPU operators etc. deploy via Infrastructure AppSet (Wave 4) before user apps (Wave 6)
129
131
- This prevents "chicken-and-egg" dependency issues and SSD thrashing
130
132
131
133
**Important**: The Infrastructure AppSet uses an explicit list of paths (not glob discovery). To add a new infrastructure component, you must add its path to `infrastructure/controllers/argocd/apps/infrastructure-appset.yaml`.
Copy file name to clipboardExpand all lines: README.md
+10-2Lines changed: 10 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -80,6 +80,7 @@ Omni provisions Talos clusters without a CNI. Install Cilium to get networking f
80
80
81
81
```bash
82
82
cilium install \
83
+
--version 1.19.0 \
83
84
--set cluster.name=talos-prod-cluster \
84
85
--set ipam.mode=kubernetes \
85
86
--set kubeProxyReplacement=true \
@@ -94,9 +95,16 @@ cilium install \
94
95
--set gatewayAPI.enableAppProtocol=true
95
96
```
96
97
97
-
> **Important:**`cluster.name`must match `infrastructure/networking/cilium/values.yaml`for Hubble certificate SANs. After ArgoCD deploys, it takes over Cilium management at Wave 0.
98
+
> **Important — version must match:**The `cilium install` CLI version must match the Helm chart version in `infrastructure/networking/cilium/kustomization.yaml`(currently **1.19.0**). Use `cilium install --version 1.19.0` to pin it. If versions differ, ArgoCD upgrades Cilium at Wave 0 and regenerates some Hubble certs but not others, causing TLS handshake failures (`x509: certificate signed by unknown authority`) that block all sync waves.
98
99
>
99
-
> If `cilium install` is run without `--set cluster.name=talos-prod-cluster`, certificates are generated for `default` or `kind-kind`. When ArgoCD later configures Cilium to expect `talos-prod-cluster`, the certificates will not match, causing TLS handshake failures in Hubble Relay (`x509: certificate signed by unknown authority`).
100
+
> **Important — cluster name must match:**`cluster.name` must match `infrastructure/networking/cilium/values.yaml` for Hubble certificate SANs. If `cilium install` is run without `--set cluster.name=talos-prod-cluster`, certificates are generated for `default` or `kind-kind`, causing the same TLS failures.
101
+
>
102
+
> **If Hubble Relay is crash-looping after bootstrap**, delete stale certs and restart:
0 commit comments