Skip to content

Commit 3344cb4

Browse files
authored
Merge pull request #739 from mitchross/claude/update-bootstrap-readme-omni-015vZLxdaHpaAyBTchJ4nK3n
Claude/update bootstrap readme omni 015v z lxda hpa ay b tch j4n k3n
2 parents 1bfecf0 + 2a93e11 commit 3344cb4

8 files changed

Lines changed: 456 additions & 8 deletions

File tree

BOOTSTRAP.md

Lines changed: 330 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,330 @@
1+
# 🚀 Bootstrap Guide - Omni & Sidero Proxmox Provider
2+
3+
> Quick start guide for bootstrapping your Kubernetes cluster using Omni and Sidero Proxmox Provider
4+
5+
This guide covers the streamlined bootstrap process when using **Omni** (Sidero's Talos management platform) and the **Sidero Proxmox Provider** instead of manual Talos configuration.
6+
7+
## Prerequisites
8+
9+
Before starting this bootstrap process, ensure you have:
10+
11+
1. **Omni deployed and accessible** - See [Omni Setup Guide](omni/omni/README.md)
12+
2. **Sidero Proxmox Provider configured** - See Proxmox provider documentation
13+
3. **Cluster created in Omni** - Your Talos cluster should be provisioned and healthy in Omni
14+
4. **kubectl access** - Download kubeconfig from Omni UI
15+
5. **Local tools installed**:
16+
- `kubectl`
17+
- `kustomize`
18+
- `cilium` CLI (optional, for verification)
19+
- `1password` CLI (`op`)
20+
21+
## Bootstrap Process
22+
23+
Once your cluster is provisioned and running via Omni, follow these steps to install the GitOps stack:
24+
25+
### Step 1: Install Cilium CNI
26+
27+
Omni provisions Talos clusters without a CNI pre-installed. Install Cilium manually to get the cluster functional:
28+
29+
```bash
30+
cilium install \
31+
--set ipam.mode=kubernetes \
32+
--set kubeProxyReplacement=true \
33+
--set securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
34+
--set securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
35+
--set cgroup.autoMount.enabled=false \
36+
--set cgroup.hostRoot=/sys/fs/cgroup \
37+
--set k8sServiceHost=localhost \
38+
--set k8sServicePort=7445 \
39+
--set gatewayAPI.enabled=true \
40+
--set gatewayAPI.enableAlpn=true \
41+
--set gatewayAPI.enableAppProtocol=true
42+
```
43+
44+
**Why these settings?**
45+
- `kubeProxyReplacement=true` - Cilium replaces kube-proxy for better performance
46+
- `gatewayAPI.*` - Enables Kubernetes Gateway API support for modern ingress
47+
- `cgroup.autoMount.enabled=false` - Required for Talos OS
48+
- `k8sServiceHost/Port` - Direct API server access
49+
50+
> **Note:** After ArgoCD is deployed, it will take over Cilium management using **Sync Wave 0** to ensure it's always deployed first, before Longhorn and other components. This prevents race conditions.
51+
52+
### Step 2: Install Gateway API CRDs
53+
54+
Install both standard and experimental Gateway API resources:
55+
56+
```bash
57+
# Apply experimental features with server-side apply
58+
kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/experimental-install.yaml
59+
60+
# Apply standard Gateway API CRDs
61+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/standard-install.yaml
62+
```
63+
64+
**Verify Cilium is running:**
65+
```bash
66+
cilium status
67+
kubectl get pods -n kube-system -l k8s-app=cilium
68+
```
69+
70+
### Step 3: Pre-Seed 1Password Secrets
71+
72+
This cluster uses [1Password Connect](https://developer.1password.com/docs/connect) and [External Secrets Operator](https://external-secrets.io/) for secret management.
73+
74+
**Create namespaces:**
75+
```bash
76+
kubectl create namespace 1passwordconnect
77+
kubectl create namespace external-secrets
78+
```
79+
80+
**Sign in to 1Password and create secrets:**
81+
```bash
82+
# Authenticate with 1Password
83+
eval $(op signin)
84+
85+
# Export credentials from 1Password
86+
export OP_CREDENTIALS=$(op read op://homelabproxmox/1passwordconnect/1password-credentials.json | base64 | tr -d '\n')
87+
export OP_CONNECT_TOKEN=$(op read 'op://homelabproxmox/1password-operator-token/credential')
88+
89+
# Create Kubernetes secrets for 1Password Connect
90+
kubectl create secret generic 1password-credentials \
91+
--namespace 1passwordconnect \
92+
--from-literal=1password-credentials.json="$OP_CREDENTIALS"
93+
94+
kubectl create secret generic 1password-operator-token \
95+
--namespace 1passwordconnect \
96+
--from-literal=token="$OP_CONNECT_TOKEN"
97+
98+
kubectl create secret generic 1passwordconnect \
99+
--namespace external-secrets \
100+
--from-literal=token="$OP_CONNECT_TOKEN"
101+
```
102+
103+
### Step 4: Bootstrap ArgoCD
104+
105+
Deploy ArgoCD using Kustomize with Helm integration:
106+
107+
```bash
108+
# Apply ArgoCD components and CRDs
109+
kustomize build infrastructure/controllers/argocd --enable-helm | kubectl apply -f -
110+
```
111+
112+
**Note:** You may see an error about `no matches for kind Application` - this is expected and will be resolved in the next step.
113+
114+
**Wait for ArgoCD to be ready:**
115+
```bash
116+
# Wait for CRDs to be established
117+
echo "Waiting for ArgoCD CRDs..."
118+
kubectl wait --for condition=established --timeout=60s crd/applications.argoproj.io
119+
120+
# Wait for ArgoCD server to be available
121+
echo "Waiting for ArgoCD server..."
122+
kubectl wait --for=condition=Available deployment/argocd-server -n argocd --timeout=300s
123+
```
124+
125+
### Step 5: Deploy Root Application
126+
127+
Apply the root Application to start the GitOps sync loop:
128+
129+
```bash
130+
kubectl apply -f infrastructure/controllers/argocd/root.yaml
131+
```
132+
133+
**Note:** You might need to run this command twice if ArgoCD hasn't fully initialized.
134+
135+
### Step 6: Refresh ArgoCD Applications
136+
137+
1. **Port-forward to ArgoCD UI:**
138+
```bash
139+
kubectl port-forward svc/argocd-server -n argocd 8080:443
140+
```
141+
142+
2. **Access ArgoCD:**
143+
- Open browser to `https://localhost:8080`
144+
- Login with credentials from ArgoCD secret:
145+
```bash
146+
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
147+
```
148+
149+
3. **Refresh Applications:**
150+
- Click on the `root` application
151+
- Click "Refresh" button
152+
- Watch as ApplicationSets discover and sync all applications
153+
154+
## What Happens Next?
155+
156+
ArgoCD now manages everything from Git using **Sync Waves** to prevent race conditions:
157+
158+
### Deployment Order (Sync Waves)
159+
160+
ArgoCD deploys applications in a specific order to avoid race conditions and SSD thrashing:
161+
162+
| Wave | Component | Purpose | Why This Order? |
163+
|------|-----------|---------|-----------------|
164+
| **0** | **Cilium** | CNI networking | Foundation - everything depends on networking |
165+
| **1** | **Longhorn** | Storage layer | Needs stable networking; other apps need storage |
166+
| **2** | **Infrastructure** | Core services (cert-manager, external-secrets, databases, etc.) | Depends on networking and storage being ready |
167+
| **3** | **Monitoring** | Prometheus, Grafana, alerts | Monitors the infrastructure |
168+
| **4** | **My-Apps** | User applications | Runs on top of everything else |
169+
170+
**Why Sync Waves Matter:**
171+
- **Prevents race conditions** - Cilium won't be reinstalled while Longhorn is deploying
172+
- **Eliminates SSD thrashing** - Longhorn waits for Cilium to be fully healthy
173+
- **Ensures stability** - Each layer is healthy before the next begins
174+
- **Proper dependencies** - Apps that need PVCs deploy after Longhorn is ready
175+
176+
**What You'll See:**
177+
1. **Wave 0**: Cilium deploys and becomes healthy
178+
2. **Wave 1**: Longhorn deploys after Cilium is ready
179+
3. **Wave 2**: Infrastructure components deploy in parallel
180+
4. **Wave 3**: Monitoring stack deploys
181+
5. **Wave 4**: Your applications deploy last
182+
183+
### Automated GitOps Management
184+
185+
Once sync waves complete:
186+
187+
1. **ArgoCD Self-Management** - ArgoCD manages its own configuration and upgrades
188+
2. **ApplicationSet Discovery** - Scans repository for applications in:
189+
- `infrastructure/*` - Core cluster components
190+
- `monitoring/*` - Prometheus, Grafana, etc.
191+
- `my-apps/*/*` - Your applications
192+
3. **Automatic Sync** - All applications sync from Git automatically
193+
4. **Self-Healing** - ArgoCD maintains desired state from Git
194+
195+
## Verification
196+
197+
Check that everything is running correctly:
198+
199+
```bash
200+
# View all ArgoCD applications
201+
kubectl get applications -n argocd
202+
203+
# Check application sync status
204+
kubectl get applications -n argocd -o wide
205+
206+
# View all pods across namespaces
207+
kubectl get pods -A
208+
209+
# Verify External Secrets are working
210+
kubectl get externalsecret -A
211+
212+
# Check Cilium status
213+
cilium status
214+
```
215+
216+
## Cluster Access
217+
218+
**Download kubeconfig from Omni:**
219+
1. Open Omni UI
220+
2. Navigate to your cluster
221+
3. Click "Download Kubeconfig"
222+
4. Save to `~/.kube/config` or set `KUBECONFIG` environment variable
223+
224+
**Manage nodes via Omni:**
225+
- All node management (upgrades, configuration, patches) is done through Omni UI
226+
- No need for `talosctl` or manual configuration
227+
- Omni handles Talos upgrades and system extensions
228+
229+
## Troubleshooting
230+
231+
### ArgoCD Won't Start
232+
233+
```bash
234+
# Check ArgoCD pods
235+
kubectl get pods -n argocd
236+
237+
# View ArgoCD server logs
238+
kubectl logs -n argocd -l app.kubernetes.io/name=argocd-server
239+
240+
# Check for CRD installation
241+
kubectl get crd applications.argoproj.io
242+
```
243+
244+
### Applications Not Syncing
245+
246+
```bash
247+
# Check ApplicationSets
248+
kubectl get applicationsets -n argocd
249+
250+
# View ApplicationSet status
251+
kubectl describe applicationset infrastructure -n argocd
252+
253+
# Force refresh of root application
254+
kubectl delete application root -n argocd
255+
kubectl apply -f infrastructure/controllers/argocd/root.yaml
256+
```
257+
258+
### Cilium Issues
259+
260+
```bash
261+
# Check Cilium status
262+
cilium status
263+
264+
# View Cilium agent logs
265+
kubectl logs -n kube-system -l k8s-app=cilium
266+
267+
# Verify connectivity
268+
cilium connectivity test
269+
```
270+
271+
### 1Password Secrets Not Working
272+
273+
```bash
274+
# Check External Secrets Operator
275+
kubectl get pods -n external-secrets
276+
277+
# View ExternalSecret status
278+
kubectl get externalsecret -A
279+
kubectl describe externalsecret <name> -n <namespace>
280+
281+
# Verify 1Password Connect is running
282+
kubectl get pods -n 1passwordconnect
283+
```
284+
285+
## Differences from Manual Talos Bootstrap
286+
287+
If you previously used manual Talos configuration with `talhelper`:
288+
289+
| Manual Talos | Omni + Sidero Provider |
290+
|-------------|------------------------|
291+
| `talhelper genconfig` | Cluster provisioned in Omni UI |
292+
| `talosctl bootstrap` | Omni handles bootstrap automatically |
293+
| `talosctl apply-config` | Configuration managed in Omni |
294+
| Manual ISO creation | Provider handles machine provisioning |
295+
| `talosctl upgrade` | Upgrades managed in Omni UI |
296+
| SOPS-encrypted secrets | Configuration stored in Omni |
297+
298+
**Benefits of Omni:**
299+
- Web UI for cluster management
300+
- Automated Talos upgrades
301+
- Infrastructure provider integration (Proxmox, AWS, etc.)
302+
- Built-in monitoring and metrics
303+
- No need for local `talosctl` configuration
304+
- Machine lifecycle management
305+
- Cluster templates and machine classes
306+
307+
## Next Steps
308+
309+
After bootstrap is complete:
310+
311+
1. **Configure DNS** - Point your domain to cluster ingress
312+
2. **Review Applications** - Check all apps in ArgoCD UI are synced
313+
3. **Setup Monitoring** - Access Grafana dashboards
314+
4. **Configure Backups** - Verify Longhorn backup configuration
315+
5. **Deploy Your Apps** - Add applications to `my-apps/` directory
316+
317+
## Additional Documentation
318+
319+
- [Omni Setup Guide](omni/omni/README.md) - Deploy your own Omni instance
320+
- [Main README](README.md) - Full cluster documentation
321+
- [ArgoCD Configuration](docs/argocd.md) - GitOps patterns explained
322+
- [Network Configuration](docs/network.md) - Cilium and Gateway API setup
323+
- [Storage Configuration](docs/storage.md) - Longhorn and persistent volumes
324+
325+
## Support
326+
327+
For issues:
328+
- **Talos/Omni**: Check [Talos documentation](https://www.talos.dev) and [Omni docs](https://omni.siderolabs.com/docs)
329+
- **ArgoCD**: See [ArgoCD documentation](https://argo-cd.readthedocs.io/)
330+
- **Cilium**: Visit [Cilium documentation](https://docs.cilium.io/)

README.md

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,22 @@
44
55
A GitOps-driven Kubernetes cluster using **Talos OS** (secure, immutable Linux for K8s), ArgoCD, and Cilium, with integrated Cloudflare Tunnel for secure external access. Built for both home lab and production environments using **enterprise-grade GitOps patterns**.
66

7+
## 🎯 Choose Your Bootstrap Method
8+
9+
This repository supports two bootstrap approaches:
10+
11+
| Method | Best For | Guide |
12+
|--------|----------|-------|
13+
| **🚀 Omni + Sidero Proxmox** | Recommended for new deployments. Web UI cluster management, automated provisioning, simplified operations. | **[BOOTSTRAP.md](BOOTSTRAP.md)**|
14+
| **⚙️ Manual Talos** | Advanced users who want full control over Talos configuration with `talhelper` and `talosctl`. | See [Quick Start](#-quick-start) below |
15+
16+
> **Using Omni?** Skip the manual setup below and jump to **[BOOTSTRAP.md](BOOTSTRAP.md)** for the streamlined workflow.
17+
718
## 📋 Table of Contents
819

920
- [Prerequisites](#-prerequisites)
1021
- [Architecture](#-architecture)
11-
- [Quick Start](#-quick-start)
22+
- [Quick Start](#-quick-start) (Manual Talos Method)
1223
- [1. System Dependencies](#1-system-dependencies)
1324
- [2. Generate Talos Configs](#2-generate-talos-configs)
1425
- [3. Boot & Bootstrap Talos Nodes](#3-boot--bootstrap-talos-nodes)
@@ -61,7 +72,11 @@ graph TD;
6172
- **GPU Integration**: Full NVIDIA GPU support via Talos system extensions and GPU Operator
6273
- **Zero SSH**: All node management via Talosctl API
6374

64-
## 🚀 Quick Start
75+
## 🚀 Quick Start (Manual Talos Method)
76+
77+
> **Note:** If you're using Omni + Sidero Proxmox Provider, see **[BOOTSTRAP.md](BOOTSTRAP.md)** instead.
78+
79+
This section covers the traditional manual Talos bootstrap process using `talhelper` and `talosctl`.
6580

6681
### 1. System Dependencies
6782
```bash

0 commit comments

Comments
 (0)