Skip to content

Commit 2ec84e3

Browse files
authored
Merge pull request #747 from mitchross/claude/fix-sync-wave-ordering-015vZLxdaHpaAyBTchJ4nK3n
Claude/fix sync wave ordering 015v z lxda hpa ay b tch j4n k3n
2 parents 45b2599 + 74a4a91 commit 2ec84e3

6 files changed

Lines changed: 28 additions & 22 deletions

File tree

BOOTSTRAP.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -193,20 +193,24 @@ ArgoCD deploys applications in a specific order to avoid race conditions and SSD
193193
| Wave | Component | Purpose | Why This Order? |
194194
|------|-----------|---------|-----------------|
195195
| **0** | **Cilium** | CNI networking | Foundation - everything depends on networking |
196-
| **1** | **1Password Connect, External Secrets, Longhorn, Garage** | Secret management and storage | 1Password → External Secrets → Longhorn (needs secrets for backups) |
197-
| **2** | **Infrastructure** | Core services (cert-manager, databases, GPU operators, etc.) | Depends on networking, secrets, and storage being ready |
196+
| **0** | **1Password Connect** | Secret backend | Required by External Secrets Operator |
197+
| **0** | **External Secrets Operator** | Secret management CRDs | Longhorn needs ExternalSecret CRD for backup credentials |
198+
| **1** | **Longhorn** | Storage layer | Needs networking + secret CRDs; other apps need storage |
199+
| **1** | **Garage** | S3-compatible object storage | Needs storage layer |
200+
| **2** | **Infrastructure** | Core services (cert-manager, GPU operators, databases, etc.) | Depends on networking and storage being ready |
198201
| **3** | **Monitoring** | Prometheus, Grafana, alerts | Monitors the infrastructure |
199202
| **4** | **My-Apps** | User applications | Runs on top of everything else |
200203

201204
**Why Sync Waves Matter:**
202205
- **Prevents race conditions** - Cilium won't be reinstalled while Longhorn is deploying
203-
- **Eliminates SSD thrashing** - Longhorn waits for Cilium to be fully healthy
206+
- **Eliminates SSD thrashing** - Longhorn waits for Cilium + secrets to be fully healthy
204207
- **Ensures stability** - Each layer is healthy before the next begins
205208
- **Proper dependencies** - Apps that need PVCs deploy after Longhorn is ready
209+
- **Secret management** - ExternalSecret CRDs exist before resources try to use them
206210
207211
**What You'll See:**
208-
1. **Wave 0**: Cilium deploys and becomes healthy
209-
2. **Wave 1**: 1Password Connect → External Secrets Operator → Longhorn and Garage deploy in parallel
212+
1. **Wave 0**: Cilium, 1Password Connect, and External Secrets Operator deploy in parallel
213+
2. **Wave 1**: Longhorn and Garage deploy after networking + secrets are ready
210214
3. **Wave 2**: Infrastructure components deploy in parallel
211215
4. **Wave 3**: Monitoring stack deploys
212216
5. **Wave 4**: Your applications deploy last

infrastructure/controllers/argocd/apps/1passwordconnect.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ metadata:
44
name: 1passwordconnect
55
namespace: argocd
66
annotations:
7-
argocd.argoproj.io/sync-wave: "1"
7+
argocd.argoproj.io/sync-wave: "0"
88
finalizers:
99
- resources-finalizer.argocd.argoproj.io
1010
spec:

infrastructure/controllers/argocd/apps/external-secrets.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ metadata:
44
name: external-secrets
55
namespace: argocd
66
annotations:
7-
argocd.argoproj.io/sync-wave: "1"
7+
argocd.argoproj.io/sync-wave: "0"
88
finalizers:
99
- resources-finalizer.argocd.argoproj.io
1010
spec:

infrastructure/controllers/argocd/apps/infrastructure-appset.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ spec:
1111
repoURL: https://github.com/mitchross/talos-argocd-proxmox.git
1212
revision: main
1313
directories:
14-
# Controllers (argocd managed via Helm, cilium has its own app)
14+
# Controllers (argocd managed via Helm, cilium/external-secrets/1password have their own apps)
1515
- path: infrastructure/controllers/cert-manager
1616
- path: infrastructure/controllers/gpu-priority-classes
1717
- path: infrastructure/controllers/intel-device-plugins
Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
apiVersion: kustomize.config.k8s.io/v1beta1
22
kind: Kustomization
33
resources:
4-
- projects.yaml
5-
# Critical applications with specific sync waves to prevent race conditions
6-
- cilium-app.yaml # Wave 0 - Networking foundation
7-
- 1passwordconnect.yaml # Wave 1 - Secret backend (needed by External Secrets)
8-
- external-secrets.yaml # Wave 1 - Secret management (needed by Longhorn)
9-
- longhorn-app.yaml # Wave 1 - Storage foundation
10-
# ApplicationSets for automatic discovery
11-
- infrastructure-appset.yaml # Wave 2
12-
- monitoring-appset.yaml # Wave 3
13-
- my-apps-appset.yaml # Wave 4
14-
- garage.yaml
4+
- projects.yaml
5+
# Critical applications with specific sync waves to prevent race conditions
6+
# Wave 0: Foundation components (networking + secrets infrastructure)
7+
- cilium-app.yaml # Wave 0 - Networking foundation
8+
- 1passwordconnect.yaml # Wave 0 - Secret backend (required by External Secrets)
9+
- external-secrets.yaml # Wave 0 - External Secrets CRDs (required by Longhorn)
10+
- longhorn-app.yaml # Wave 1 - Storage foundation
11+
# ApplicationSets for automatic discovery
12+
- infrastructure-appset.yaml # Wave 2
13+
- monitoring-appset.yaml # Wave 3
14+
- my-apps-appset.yaml # Wave 4
15+
- garage.yaml

infrastructure/controllers/argocd/apps/longhorn-app.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@ spec:
2727
- CreateNamespace=true
2828
- ServerSideApply=true
2929
- RespectIgnoreDifferences=true
30-
- SkipDryRunOnMissingResource=true
3130
retry:
3231
limit: 10
3332
backoff:
@@ -47,5 +46,7 @@ spec:
4746
- /data
4847
- group: gateway.networking.k8s.io
4948
kind: HTTPRoute
50-
jqPathExpressions:
51-
- .status
49+
jsonPointers:
50+
- /spec/rules/0/backendRefs/0/group
51+
- /spec/rules/0/backendRefs/0/kind
52+
- /spec/rules/0/backendRefs/0/weight

0 commit comments

Comments
 (0)