|
| 1 | +# 🚀 Bootstrap Guide - Omni & Sidero Proxmox Provider |
| 2 | + |
| 3 | +> Quick start guide for bootstrapping your Kubernetes cluster using Omni and Sidero Proxmox Provider |
| 4 | +
|
| 5 | +This guide covers the streamlined bootstrap process when using **Omni** (Sidero's Talos management platform) and the **Sidero Proxmox Provider** instead of manual Talos configuration. |
| 6 | + |
| 7 | +## Prerequisites |
| 8 | + |
| 9 | +Before starting this bootstrap process, ensure you have: |
| 10 | + |
| 11 | +1. **Omni deployed and accessible** - See [Omni Setup Guide](omni/omni/README.md) |
| 12 | +2. **Sidero Proxmox Provider configured** - See Proxmox provider documentation |
| 13 | +3. **Cluster created in Omni** - Your Talos cluster should be provisioned and healthy in Omni |
| 14 | +4. **kubectl access** - Download kubeconfig from Omni UI |
| 15 | +5. **Local tools installed**: |
| 16 | + - `kubectl` |
| 17 | + - `kustomize` |
| 18 | + - `cilium` CLI (optional, for verification) |
| 19 | + - `1password` CLI (`op`) |
| 20 | + |
| 21 | +## Bootstrap Process |
| 22 | + |
| 23 | +Once your cluster is provisioned and running via Omni, follow these steps to install the GitOps stack: |
| 24 | + |
| 25 | +### Step 1: Install Cilium CNI |
| 26 | + |
| 27 | +Omni provisions Talos clusters without a CNI pre-installed. Install Cilium manually to get the cluster functional: |
| 28 | + |
| 29 | +```bash |
| 30 | +cilium install \ |
| 31 | + --set ipam.mode=kubernetes \ |
| 32 | + --set kubeProxyReplacement=true \ |
| 33 | + --set securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \ |
| 34 | + --set securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \ |
| 35 | + --set cgroup.autoMount.enabled=false \ |
| 36 | + --set cgroup.hostRoot=/sys/fs/cgroup \ |
| 37 | + --set k8sServiceHost=localhost \ |
| 38 | + --set k8sServicePort=7445 \ |
| 39 | + --set gatewayAPI.enabled=true \ |
| 40 | + --set gatewayAPI.enableAlpn=true \ |
| 41 | + --set gatewayAPI.enableAppProtocol=true |
| 42 | +``` |
| 43 | + |
| 44 | +**Why these settings?** |
| 45 | +- `kubeProxyReplacement=true` - Cilium replaces kube-proxy for better performance |
| 46 | +- `gatewayAPI.*` - Enables Kubernetes Gateway API support for modern ingress |
| 47 | +- `cgroup.autoMount.enabled=false` - Required for Talos OS |
| 48 | +- `k8sServiceHost/Port` - Direct API server access |
| 49 | + |
| 50 | +> **Note:** After ArgoCD is deployed, it will take over Cilium management using **Sync Wave 0** to ensure it's always deployed first, before Longhorn and other components. This prevents race conditions. |
| 51 | +
|
| 52 | +### Step 2: Install Gateway API CRDs |
| 53 | + |
| 54 | +Install both standard and experimental Gateway API resources: |
| 55 | + |
| 56 | +```bash |
| 57 | +# Apply experimental features with server-side apply |
| 58 | +kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/experimental-install.yaml |
| 59 | + |
| 60 | +# Apply standard Gateway API CRDs |
| 61 | +kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/standard-install.yaml |
| 62 | +``` |
| 63 | + |
| 64 | +**Verify Cilium is running:** |
| 65 | +```bash |
| 66 | +cilium status |
| 67 | +kubectl get pods -n kube-system -l k8s-app=cilium |
| 68 | +``` |
| 69 | + |
| 70 | +### Step 3: Pre-Seed 1Password Secrets |
| 71 | + |
| 72 | +This cluster uses [1Password Connect](https://developer.1password.com/docs/connect) and [External Secrets Operator](https://external-secrets.io/) for secret management. |
| 73 | + |
| 74 | +**Create namespaces:** |
| 75 | +```bash |
| 76 | +kubectl create namespace 1passwordconnect |
| 77 | +kubectl create namespace external-secrets |
| 78 | +``` |
| 79 | + |
| 80 | +**Sign in to 1Password and create secrets:** |
| 81 | +```bash |
| 82 | +# Authenticate with 1Password |
| 83 | +eval $(op signin) |
| 84 | + |
| 85 | +# Export credentials from 1Password |
| 86 | +export OP_CREDENTIALS=$(op read op://homelabproxmox/1passwordconnect/1password-credentials.json | base64 | tr -d '\n') |
| 87 | +export OP_CONNECT_TOKEN=$(op read 'op://homelabproxmox/1password-operator-token/credential') |
| 88 | + |
| 89 | +# Create Kubernetes secrets for 1Password Connect |
| 90 | +kubectl create secret generic 1password-credentials \ |
| 91 | + --namespace 1passwordconnect \ |
| 92 | + --from-literal=1password-credentials.json="$OP_CREDENTIALS" |
| 93 | + |
| 94 | +kubectl create secret generic 1password-operator-token \ |
| 95 | + --namespace 1passwordconnect \ |
| 96 | + --from-literal=token="$OP_CONNECT_TOKEN" |
| 97 | + |
| 98 | +kubectl create secret generic 1passwordconnect \ |
| 99 | + --namespace external-secrets \ |
| 100 | + --from-literal=token="$OP_CONNECT_TOKEN" |
| 101 | +``` |
| 102 | + |
| 103 | +### Step 4: Bootstrap ArgoCD |
| 104 | + |
| 105 | +Deploy ArgoCD using Kustomize with Helm integration: |
| 106 | + |
| 107 | +```bash |
| 108 | +# Apply ArgoCD components and CRDs |
| 109 | +kustomize build infrastructure/controllers/argocd --enable-helm | kubectl apply -f - |
| 110 | +``` |
| 111 | + |
| 112 | +**Note:** You may see an error about `no matches for kind Application` - this is expected and will be resolved in the next step. |
| 113 | + |
| 114 | +**Wait for ArgoCD to be ready:** |
| 115 | +```bash |
| 116 | +# Wait for CRDs to be established |
| 117 | +echo "Waiting for ArgoCD CRDs..." |
| 118 | +kubectl wait --for condition=established --timeout=60s crd/applications.argoproj.io |
| 119 | + |
| 120 | +# Wait for ArgoCD server to be available |
| 121 | +echo "Waiting for ArgoCD server..." |
| 122 | +kubectl wait --for=condition=Available deployment/argocd-server -n argocd --timeout=300s |
| 123 | +``` |
| 124 | + |
| 125 | +### Step 5: Deploy Root Application |
| 126 | + |
| 127 | +Apply the root Application to start the GitOps sync loop: |
| 128 | + |
| 129 | +```bash |
| 130 | +kubectl apply -f infrastructure/controllers/argocd/root.yaml |
| 131 | +``` |
| 132 | + |
| 133 | +**Note:** You might need to run this command twice if ArgoCD hasn't fully initialized. |
| 134 | + |
| 135 | +### Step 6: Refresh ArgoCD Applications |
| 136 | + |
| 137 | +1. **Port-forward to ArgoCD UI:** |
| 138 | + ```bash |
| 139 | + kubectl port-forward svc/argocd-server -n argocd 8080:443 |
| 140 | + ``` |
| 141 | + |
| 142 | +2. **Access ArgoCD:** |
| 143 | + - Open browser to `https://localhost:8080` |
| 144 | + - Login with credentials from ArgoCD secret: |
| 145 | + ```bash |
| 146 | + kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d |
| 147 | + ``` |
| 148 | + |
| 149 | +3. **Refresh Applications:** |
| 150 | + - Click on the `root` application |
| 151 | + - Click "Refresh" button |
| 152 | + - Watch as ApplicationSets discover and sync all applications |
| 153 | + |
| 154 | +## What Happens Next? |
| 155 | + |
| 156 | +ArgoCD now manages everything from Git using **Sync Waves** to prevent race conditions: |
| 157 | + |
| 158 | +### Deployment Order (Sync Waves) |
| 159 | + |
| 160 | +ArgoCD deploys applications in a specific order to avoid race conditions and SSD thrashing: |
| 161 | + |
| 162 | +| Wave | Component | Purpose | Why This Order? | |
| 163 | +|------|-----------|---------|-----------------| |
| 164 | +| **0** | **Cilium** | CNI networking | Foundation - everything depends on networking | |
| 165 | +| **1** | **Longhorn** | Storage layer | Needs stable networking; other apps need storage | |
| 166 | +| **2** | **Infrastructure** | Core services (cert-manager, external-secrets, databases, etc.) | Depends on networking and storage being ready | |
| 167 | +| **3** | **Monitoring** | Prometheus, Grafana, alerts | Monitors the infrastructure | |
| 168 | +| **4** | **My-Apps** | User applications | Runs on top of everything else | |
| 169 | + |
| 170 | +**Why Sync Waves Matter:** |
| 171 | +- **Prevents race conditions** - Cilium won't be reinstalled while Longhorn is deploying |
| 172 | +- **Eliminates SSD thrashing** - Longhorn waits for Cilium to be fully healthy |
| 173 | +- **Ensures stability** - Each layer is healthy before the next begins |
| 174 | +- **Proper dependencies** - Apps that need PVCs deploy after Longhorn is ready |
| 175 | +
|
| 176 | +**What You'll See:** |
| 177 | +1. **Wave 0**: Cilium deploys and becomes healthy |
| 178 | +2. **Wave 1**: Longhorn deploys after Cilium is ready |
| 179 | +3. **Wave 2**: Infrastructure components deploy in parallel |
| 180 | +4. **Wave 3**: Monitoring stack deploys |
| 181 | +5. **Wave 4**: Your applications deploy last |
| 182 | + |
| 183 | +### Automated GitOps Management |
| 184 | + |
| 185 | +Once sync waves complete: |
| 186 | + |
| 187 | +1. **ArgoCD Self-Management** - ArgoCD manages its own configuration and upgrades |
| 188 | +2. **ApplicationSet Discovery** - Scans repository for applications in: |
| 189 | + - `infrastructure/*` - Core cluster components |
| 190 | + - `monitoring/*` - Prometheus, Grafana, etc. |
| 191 | + - `my-apps/*/*` - Your applications |
| 192 | +3. **Automatic Sync** - All applications sync from Git automatically |
| 193 | +4. **Self-Healing** - ArgoCD maintains desired state from Git |
| 194 | + |
| 195 | +## Verification |
| 196 | + |
| 197 | +Check that everything is running correctly: |
| 198 | + |
| 199 | +```bash |
| 200 | +# View all ArgoCD applications |
| 201 | +kubectl get applications -n argocd |
| 202 | +
|
| 203 | +# Check application sync status |
| 204 | +kubectl get applications -n argocd -o wide |
| 205 | +
|
| 206 | +# View all pods across namespaces |
| 207 | +kubectl get pods -A |
| 208 | +
|
| 209 | +# Verify External Secrets are working |
| 210 | +kubectl get externalsecret -A |
| 211 | +
|
| 212 | +# Check Cilium status |
| 213 | +cilium status |
| 214 | +``` |
| 215 | + |
| 216 | +## Cluster Access |
| 217 | + |
| 218 | +**Download kubeconfig from Omni:** |
| 219 | +1. Open Omni UI |
| 220 | +2. Navigate to your cluster |
| 221 | +3. Click "Download Kubeconfig" |
| 222 | +4. Save to `~/.kube/config` or set `KUBECONFIG` environment variable |
| 223 | + |
| 224 | +**Manage nodes via Omni:** |
| 225 | +- All node management (upgrades, configuration, patches) is done through Omni UI |
| 226 | +- No need for `talosctl` or manual configuration |
| 227 | +- Omni handles Talos upgrades and system extensions |
| 228 | + |
| 229 | +## Troubleshooting |
| 230 | + |
| 231 | +### ArgoCD Won't Start |
| 232 | + |
| 233 | +```bash |
| 234 | +# Check ArgoCD pods |
| 235 | +kubectl get pods -n argocd |
| 236 | +
|
| 237 | +# View ArgoCD server logs |
| 238 | +kubectl logs -n argocd -l app.kubernetes.io/name=argocd-server |
| 239 | +
|
| 240 | +# Check for CRD installation |
| 241 | +kubectl get crd applications.argoproj.io |
| 242 | +``` |
| 243 | + |
| 244 | +### Applications Not Syncing |
| 245 | + |
| 246 | +```bash |
| 247 | +# Check ApplicationSets |
| 248 | +kubectl get applicationsets -n argocd |
| 249 | +
|
| 250 | +# View ApplicationSet status |
| 251 | +kubectl describe applicationset infrastructure -n argocd |
| 252 | +
|
| 253 | +# Force refresh of root application |
| 254 | +kubectl delete application root -n argocd |
| 255 | +kubectl apply -f infrastructure/controllers/argocd/root.yaml |
| 256 | +``` |
| 257 | + |
| 258 | +### Cilium Issues |
| 259 | + |
| 260 | +```bash |
| 261 | +# Check Cilium status |
| 262 | +cilium status |
| 263 | +
|
| 264 | +# View Cilium agent logs |
| 265 | +kubectl logs -n kube-system -l k8s-app=cilium |
| 266 | +
|
| 267 | +# Verify connectivity |
| 268 | +cilium connectivity test |
| 269 | +``` |
| 270 | + |
| 271 | +### 1Password Secrets Not Working |
| 272 | + |
| 273 | +```bash |
| 274 | +# Check External Secrets Operator |
| 275 | +kubectl get pods -n external-secrets |
| 276 | +
|
| 277 | +# View ExternalSecret status |
| 278 | +kubectl get externalsecret -A |
| 279 | +kubectl describe externalsecret <name> -n <namespace> |
| 280 | +
|
| 281 | +# Verify 1Password Connect is running |
| 282 | +kubectl get pods -n 1passwordconnect |
| 283 | +``` |
| 284 | + |
| 285 | +## Differences from Manual Talos Bootstrap |
| 286 | + |
| 287 | +If you previously used manual Talos configuration with `talhelper`: |
| 288 | + |
| 289 | +| Manual Talos | Omni + Sidero Provider | |
| 290 | +|-------------|------------------------| |
| 291 | +| `talhelper genconfig` | Cluster provisioned in Omni UI | |
| 292 | +| `talosctl bootstrap` | Omni handles bootstrap automatically | |
| 293 | +| `talosctl apply-config` | Configuration managed in Omni | |
| 294 | +| Manual ISO creation | Provider handles machine provisioning | |
| 295 | +| `talosctl upgrade` | Upgrades managed in Omni UI | |
| 296 | +| SOPS-encrypted secrets | Configuration stored in Omni | |
| 297 | + |
| 298 | +**Benefits of Omni:** |
| 299 | +- Web UI for cluster management |
| 300 | +- Automated Talos upgrades |
| 301 | +- Infrastructure provider integration (Proxmox, AWS, etc.) |
| 302 | +- Built-in monitoring and metrics |
| 303 | +- No need for local `talosctl` configuration |
| 304 | +- Machine lifecycle management |
| 305 | +- Cluster templates and machine classes |
| 306 | + |
| 307 | +## Next Steps |
| 308 | + |
| 309 | +After bootstrap is complete: |
| 310 | + |
| 311 | +1. **Configure DNS** - Point your domain to cluster ingress |
| 312 | +2. **Review Applications** - Check all apps in ArgoCD UI are synced |
| 313 | +3. **Setup Monitoring** - Access Grafana dashboards |
| 314 | +4. **Configure Backups** - Verify Longhorn backup configuration |
| 315 | +5. **Deploy Your Apps** - Add applications to `my-apps/` directory |
| 316 | + |
| 317 | +## Additional Documentation |
| 318 | + |
| 319 | +- [Omni Setup Guide](omni/omni/README.md) - Deploy your own Omni instance |
| 320 | +- [Main README](README.md) - Full cluster documentation |
| 321 | +- [ArgoCD Configuration](docs/argocd.md) - GitOps patterns explained |
| 322 | +- [Network Configuration](docs/network.md) - Cilium and Gateway API setup |
| 323 | +- [Storage Configuration](docs/storage.md) - Longhorn and persistent volumes |
| 324 | + |
| 325 | +## Support |
| 326 | + |
| 327 | +For issues: |
| 328 | +- **Talos/Omni**: Check [Talos documentation](https://www.talos.dev) and [Omni docs](https://omni.siderolabs.com/docs) |
| 329 | +- **ArgoCD**: See [ArgoCD documentation](https://argo-cd.readthedocs.io/) |
| 330 | +- **Cilium**: Visit [Cilium documentation](https://docs.cilium.io/) |
0 commit comments