Skip to content

Commit f264827

Browse files
committed
up
1 parent 4c4260a commit f264827

3 files changed

Lines changed: 55 additions & 179 deletions

File tree

.vscode/settings.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"chat.tools.terminal.autoApprove": {
3+
"kubectl delete": true
4+
}
5+
}

README.md

Lines changed: 49 additions & 178 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,10 @@ This repository supports two bootstrap approaches:
4747
- Proxmox VMs or bare metal (see hardware below)
4848
- Domain configured in Cloudflare
4949
- 1Password account for secrets management
50-
- [Talosctl](https://www.talos.dev/v1.10/introduction/getting-started/) and [Talhelper](https://github.com/budimanjojo/talhelper) installed
51-
- `kubectl`, `kustomize`, `sops` installed locally
50+
- **Omni account** (recommended) or manual Talos setup
51+
- `kubectl` and `helm` installed locally
52+
53+
> **See [BOOTSTRAP.md](BOOTSTRAP.md) for detailed prerequisites and setup instructions.**
5254
5355
## 🏗️ Architecture
5456

@@ -94,201 +96,70 @@ The cluster uses **ArgoCD Sync Waves** to strictly order deployments, preventing
9496

9597
*See [docs/argocd.md](docs/argocd.md) for the deep dive on health checks and dependency management.*
9698

97-
## 🚀 Quick Start (Manual Talos Method)
98-
99-
> **Note:** If you're using Omni + Sidero Proxmox Provider, see **[BOOTSTRAP.md](BOOTSTRAP.md)** instead.
100-
101-
This section covers the traditional manual Talos bootstrap process using `talhelper` and `talosctl`.
99+
## 🚀 Quick Start
102100

103-
### 1. System Dependencies
104-
```bash
105-
# On your macOS workstation using Homebrew
106-
brew install talosctl sops yq kubectl kustomize
107-
brew install budimanjojo/tap/talhelper
101+
The cluster bootstrap process is fully documented in **[BOOTSTRAP.md](BOOTSTRAP.md)**. Follow that guide for step-by-step instructions.
108102

109-
# For Linux/Windows, please see the official installation docs for each tool.
110-
```
103+
**Quick Overview:**
104+
1. **Provision Cluster** - Use Omni + Sidero Proxmox Provider (recommended) or manual Talos
105+
2. **Install Cilium** - CNI networking with Gateway API support
106+
3. **Configure Secrets** - Set up 1Password Connect and External Secrets
107+
4. **Bootstrap ArgoCD** - Deploy GitOps controller using the bootstrap script
108+
5. **Watch It Deploy** - ArgoCD automatically discovers and syncs all applications
111109

112-
### 2. Generate Talos Configs
113-
```bash
114-
# Navigate to the Talos configuration directory
115-
cd iac/talos
110+
### Automated Deployment
116111

117-
# Edit talconfig.yaml to match your cluster topology and node IPs
118-
# Then, generate the encrypted secrets file
119-
talhelper gensecret > talsecret.sops.yaml
112+
Once ArgoCD is bootstrapped, it automatically:
113+
- Syncs all applications from Git using **Sync Waves** (prevents race conditions)
114+
- Manages its own configuration (self-managing GitOps)
115+
- Discovers new applications by directory structure (no manual Application manifests)
116+
- Maintains cluster state declaratively
120117

121-
# IMPORTANT: You must encrypt the file with SOPS for Talos to use it
122-
sops --encrypt --in-place talsecret.sops.yaml
118+
**See [BOOTSTRAP.md](BOOTSTRAP.md) for complete instructions.**
123119

124-
# Generate the machine configs for each node
125-
talhelper genconfig
126-
```
127-
128-
### 3. Boot & Bootstrap Talos Nodes
129-
- Boot each VM or bare-metal host with its corresponding generated ISO from `iac/talos/clusterconfig/`.
130-
- Set your `TALOSCONFIG` and `KUBECONFIG` environment variables to point to the generated files.
131-
```bash
132-
# Set environment variables for your shell session
133-
export TALOSCONFIG=./iac/talos/clusterconfig/talosconfig
134-
export KUBECONFIG=./iac/talos/clusterconfig/kubeconfig
135-
```
136-
- Bootstrap the cluster on a **single control plane node**.
137-
```bash
138-
# Run ONCE on a single control plane node IP
139-
talosctl bootstrap --nodes <control-plane-ip>
140-
```
141-
- Apply the machine configuration to all nodes in the cluster. This command should be run from the root of the repository after setting `TALOSCONFIG`.
142-
```bash
143-
talosctl apply-config --nodes <node-ip-1> --file iac/talos/clusterconfig/<node-1-name>.yaml
144-
talosctl apply-config --nodes <node-ip-2> --file iac/talos/clusterconfig/<node-2-name>.yaml
145-
# ... repeat for all nodes
146-
```
120+
## 🔍 Verification
121+
After bootstrap completes, verify everything is working:
147122

148-
### 4. Install Gateway API CRDs
149-
This is a prerequisite for Cilium's Gateway API integration.
150-
```bash
151-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/standard-install.yaml
152-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/experimental-install.yaml
153-
kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/experimental-install.yaml
154-
```
155-
### Install Cilium CNI with Gateway API support
156123
```bash
157-
kubectl kustomize infrastructure/networking/cilium --enable-helm | kubectl apply -f -
158-
159-
160-
# Verify Cilium is running
161-
kubectl get pods -n kube-system -l k8s-app=cilium
162-
```
163-
164-
### 5. Configure Secret Management
165-
This cluster uses [1Password Connect](https://developer.1password.com/docs/connect) and [External Secrets Operator](https://external-secrets.io/) to manage secrets.
166-
167-
1. **Generate 1Password Connect Credentials**: Follow the [1Password documentation](https://developer.1password.com/docs/connect/get-started#step-2-deploy-the-1password-connect-server) to generate your `1password-credentials.json` file and your access token.
168-
169-
2. **Create Namespaces**:
170-
```bash
171-
kubectl create namespace 1passwordconnect
172-
kubectl create namespace external-secrets
173-
```
174-
175-
3. **Create Kubernetes Secrets**:
176-
```bash
177-
eval $(op signin)
178-
export OP_CREDENTIALS=$(op read op://homelabproxmox/1passwordconnect/1password-credentials.json | base64 | tr -d '\n')
179-
export OP_CONNECT_TOKEN=$(op read 'op://homelabproxmox/1password-operator-token/credential')
180-
181-
kubectl create secret generic 1password-credentials \
182-
--namespace 1passwordconnect \
183-
--from-literal=1password-credentials.json="$OP_CREDENTIALS"
184-
185-
kubectl create secret generic 1password-operator-token \
186-
--namespace 1passwordconnect \
187-
--from-literal=token="$OP_CONNECT_TOKEN"
188-
189-
kubectl create secret generic 1passwordconnect \
190-
--namespace external-secrets \
191-
--from-literal=token="$OP_CONNECT_TOKEN"
192-
```
124+
# Watch ArgoCD sync status (STATUS should show 'Synced')
125+
kubectl get applications -n argocd -w
193126

194-
### 6. Bootstrap ArgoCD & Deploy The Stack
195-
This final step uses our "App of Apps" pattern to bootstrap the entire cluster. This is a multi-step process to avoid race conditions with CRD installation.
127+
# Verify all pods are running (may take 10-15 minutes)
128+
kubectl get pods -A
196129

197-
```bash
198-
# 1. Apply the ArgoCD main components and CRDs
199-
# This deploys the ArgoCD Helm chart, which creates the CRDs and controller.
200-
kustomize build infrastructure/controllers/argocd --enable-helm | kubectl apply -f -
201-
# NOTE: You'll see an error about "no matches for kind Application" - this is EXPECTED!
202-
# The root.yaml Application can't be applied until the CRDs are established in step 2.
203-
204-
# 2. Wait for the ArgoCD CRDs to be established in the cluster
205-
# This command pauses until the Kubernetes API server recognizes the 'Application' resource type.
206-
echo "Waiting for ArgoCD CRDs to be established..."
207-
kubectl wait --for condition=established --timeout=60s crd/applications.argoproj.io
130+
# Check External Secrets are populated
131+
kubectl get externalsecret -A
208132

209-
# 3. Wait for the ArgoCD server to be ready
210-
# This ensures the ArgoCD server is running before we apply the root application.
211-
echo "Waiting for ArgoCD server to be available..."
212-
kubectl wait --for=condition=Available deployment/argocd-server -n argocd --timeout=300s
133+
# Verify Longhorn backups configured
134+
kubectl get backuptarget -n longhorn-system
213135

214-
# 4. Apply the Root Application
215-
# Now that ArgoCD is running and its CRDs are ready, we can apply the 'root' application
216-
# to kickstart the self-managing GitOps loop.
217-
echo "Applying the root application..."
218-
kubectl apply -f infrastructure/controllers/argocd/root.yaml
136+
# View sync waves in action
137+
kubectl get applications -n argocd -o custom-columns=NAME:.metadata.name,WAVE:.metadata.annotations.argocd\.argoproj\.io/sync-wave,STATUS:.status.sync.status
219138
```
220-
**That's it!** You have successfully and reliably bootstrapped the cluster.
221-
222-
### What Happens Next Automatically?
223139

224-
1. **ArgoCD Syncs Itself**: The `root` Application tells ArgoCD to sync the contents of `infrastructure/argocd/apps/`.
225-
2. **Projects & AppSets Created**: ArgoCD creates the `AppProject`s and the three `ApplicationSet`s (`infrastructure`, `monitoring`, `my-apps`).
226-
3. **Applications Discovered**: The `ApplicationSet`s scan the repository for any directories matching their defined paths (e.g., `infrastructure/*`, `monitoring/*`, `my-apps/*/*`) and create the corresponding ArgoCD `Application` resources.
227-
4. **Cluster Reconciliation**: ArgoCD syncs all discovered applications, building the entire cluster state declaratively from Git.
140+
**Full verification steps in [BOOTSTRAP.md](BOOTSTRAP.md#verification)**
228141

229-
## 🔍 Verification
230-
After the final step, you can monitor the deployment and verify that everything is working correctly.
142+
## 🛡️ Talos OS Features
143+
- **No SSH**: All management via API (Omni UI or `talosctl`)
144+
- **Immutable OS**: No package manager, no shell access
145+
- **Declarative**: All config stored in Git or Omni
146+
- **System Extensions**: GPU, storage drivers enabled at boot
147+
- **Secure by Default**: Minimal attack surface
231148

232-
```bash
233-
# Check Talos node health (run for each node)
234-
talosctl health --nodes <node-ip>
149+
### Node Management
235150

236-
# Watch ArgoCD sync status
237-
# The `STATUS` column should eventually show `Synced` for all applications
238-
kubectl get applications -n argocd -w
151+
**Using Omni (Recommended):**
152+
- Manage all nodes through Omni web UI
153+
- Automated Talos upgrades
154+
- Visual cluster health monitoring
155+
- No manual `talosctl` commands needed
239156

240-
# Verify all pods are running across the cluster
241-
# It may take 10-15 minutes for all images to pull and pods to become Ready.
242-
kubectl get pods -A
243-
244-
# Check that secrets have been populated by External Secrets
245-
kubectl get externalsecret -A
246-
# You should see secrets like `cloudflare-api-credentials` in the `cert-manager` namespace
247-
248-
# Verify the Longhorn UI is accessible and backups are configured
249-
kubectl get backuptarget -n longhorn-system
250-
```
157+
**Manual Talos:**
158+
- See `iac/talos/` directory for configuration
159+
- Use `talosctl` for node operations
160+
- Requires `talhelper` for config generation
251161

252-
## 🛡️ Talos-Specific Notes
253-
- **No SSH**: All management via `talosctl` API
254-
- **Immutable OS**: No package manager, no shell
255-
- **Declarative**: All config in Git, applied via Talhelper/Talosctl
256-
- **System Extensions**: GPU, storage, and other drivers enabled via config
257-
- **SOPS**: Used for encrypting Talos secrets
258-
- **No plaintext secrets in Git**
259-
260-
#### Upgrading Nodes
261-
When a new version of Talos is released or system extensions in `iac/talos/talconfig.yaml` are changed, follow this process to upgrade your nodes. This method uses the direct `upgrade` command to ensure the new system image is correctly applied, which is more reliable than `apply-config` for image changes.
262-
263-
**Important:** Always upgrade control plane nodes **one at a time**, waiting for each node to successfully reboot and rejoin the cluster before proceeding to the next. This prevents losing etcd quorum. Worker nodes can be upgraded in parallel after the control plane is healthy.
264-
265-
1. **Update Configuration**:
266-
Modify `iac/talos/talconfig.yaml` with the new `talosVersion` or changes to `systemExtensions`.
267-
268-
2. **Ensure Environment is Set**:
269-
Make sure your `TALOSCONFIG` variable is pointing to your generated cluster configuration file as described in the Quick Start.
270-
271-
3. **Upgrade a Control Plane Node**:
272-
Run the following commands from the root of the repository. Replace `<node-name>` and `<node-ip>` with the target node's details. Run this for each control plane node sequentially.
273-
274-
```bash
275-
# Example for the first control plane node
276-
NODE_NAME="talos-cluster-control-00"
277-
NODE_IP="192.168.10.100" # Replace with your node's IP
278-
INSTALLER_URL=$(talhelper genurl installer -c iac/talos/talconfig.yaml -n "$NODE_NAME")
279-
talosctl upgrade --nodes "$NODE_IP" --image "$INSTALLER_URL"
280-
```
281-
Wait for the command to complete and verify the node is healthy with `talosctl health --nodes <node-ip>` before moving to the next control plane node.
282-
283-
4. **Upgrade Worker Nodes**:
284-
Once the control plane is fully upgraded and healthy, you can upgrade the worker nodes. These can be run in parallel from separate terminals.
285-
```bash
286-
# Example for the GPU worker node
287-
NODE_NAME="talos-cluster-gpu-worker-00"
288-
NODE_IP="192.168.10.200" # Replace with your node's IP
289-
INSTALLER_URL=$(talhelper genurl installer -c iac/talos/talconfig.yaml -n "$NODE_NAME")
290-
talosctl upgrade --nodes "$NODE_IP" --image "$INSTALLER_URL"
291-
```
162+
> **For manual Talos setup and upgrades, see legacy documentation in `iac/talos/README.md`**
292163
293164
## 🗄️ MinIO S3 Backup Configuration
294165

scripts/bootstrap-argocd.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ echo ""
2020
echo "⎈ Installing ArgoCD via Helm..."
2121
helm upgrade --install argocd argo-cd \
2222
--repo https://argoproj.github.io/argo-helm \
23-
--version 9.1.3 \
23+
--version 9.3.0 \
2424
--namespace argocd \
2525
--values "$ROOT_DIR/infrastructure/controllers/argocd/values.yaml" \
2626
--wait \

0 commit comments

Comments
 (0)