Skip to content

Commit d67cb10

Browse files
committed
up
1 parent 2cd5d61 commit d67cb10

2 files changed

Lines changed: 94 additions & 143 deletions

File tree

README.md

Lines changed: 94 additions & 141 deletions
Original file line numberDiff line numberDiff line change
@@ -8,24 +8,26 @@ A GitOps-driven Kubernetes cluster using **Talos OS** (secure, immutable Linux f
88

99
- [Prerequisites](#-prerequisites)
1010
- [Architecture](#-architecture)
11-
- [GitOps Architecture](#️-gitops-architecture)
1211
- [Quick Start](#-quick-start)
12+
- [1. System Dependencies](#1-system-dependencies)
13+
- [2. Generate Talos Configs](#2-generate-talos-configs)
14+
- [3. Boot & Bootstrap Talos Nodes](#3-boot--bootstrap-talos-nodes)
15+
- [4. Install Gateway API CRDs](#4-install-gateway-api-crds)
16+
- [5. Configure Secret Management](#5-configure-secret-management)
17+
- [6. Bootstrap ArgoCD & Deploy The Stack](#6-bootstrap-argocd--deploy-the-stack)
1318
- [Verification](#-verification)
19+
- [Talos-Specific Notes](#️-talos-specific-notes)
20+
- [MinIO S3 Backup Configuration](#-minio-s3-backup-configuration)
1421
- [Documentation](#-documentation)
15-
- [Hardware Stack](#-hardware-stack)
16-
- [Scaling](#-scaling-options)
1722
- [Troubleshooting](#-troubleshooting)
18-
- [Contributing](#-contributing)
19-
- [License](#-license)
2023

2124
## 📋 Prerequisites
2225

2326
- Proxmox VMs or bare metal (see hardware below)
2427
- Domain configured in Cloudflare
2528
- 1Password account for secrets management
2629
- [Talosctl](https://www.talos.dev/v1.10/introduction/getting-started/) and [Talhelper](https://github.com/budimanjojo/talhelper) installed
27-
- `kubectl` installed locally
28-
- `cloudflared` installed locally
30+
- `kubectl`, `kustomize`, `sops` installed locally
2931

3032
## 🏗️ Architecture
3133

@@ -93,155 +95,137 @@ graph TD;
9395
- **GPU Integration**: Full NVIDIA GPU support via Talos system extensions and GPU Operator
9496
- **Zero SSH**: All node management via Talosctl API
9597

96-
## 🏗️ GitOps Architecture
97-
98-
This repository implements a **production-grade GitOps workflow** using a multi-tiered ApplicationSet pattern. This separates concerns, simplifies management, and provides a clear, scalable structure.
99-
100-
### Self-Managing ArgoCD
101-
102-
The process starts with a single command to install ArgoCD's components and CRDs. Then, a single `Application` resource (`infrastructure/argocd-app.yaml`) is applied, which configures ArgoCD to manage its own installation and upgrades directly from this Git repository. This is the core of the **self-healing infrastructure** pattern.
103-
104-
### Three-Tier ApplicationSets
105-
106-
The cluster is organized into three distinct `ApplicationSet` resources, each responsible for a different layer of the stack. This provides clear separation of concerns and access control.
107-
108-
| ApplicationSet | Directory | Deploys | Description |
109-
| :--- | :--- | :--- | :--- |
110-
| **Infrastructure** | `infrastructure/` | Core Services | Manages essential components like ArgoCD, Cilium, storage, and other operators. |
111-
| **Monitoring** | `monitoring/` | Observability | Deploys the full monitoring stack, including Prometheus, Grafana, and Loki. |
112-
| **Applications** | `my-apps/` | User Workloads | Manages all end-user applications, such as Plex, Ollama, and Home Assistant. |
113-
114-
Each `ApplicationSet` automatically discovers new applications when a new directory is added to its designated path (e.g., adding `my-apps/new-app/` will automatically create a new ArgoCD application).
115-
116-
### Directory Structure
117-
118-
The repository's structure directly maps to the ApplicationSet strategy, making it intuitive to manage.
119-
120-
```
121-
.
122-
├── infrastructure/ # Infrastucture ApplicationSet
123-
│ ├── controllers/
124-
│ │ └── argocd/ # ArgoCD self-management config
125-
│ ├── networking/ # Cilium, Gateway API, etc.
126-
│ ├── storage/ # Longhorn, CSI drivers, etc.
127-
│ └── infrastructure-components-appset.yaml
128-
├── monitoring/ # Monitoring ApplicationSet
129-
│ ├── prometheus-stack/ # Prometheus, Grafana, etc.
130-
│ └── monitoring-components-appset.yaml
131-
├── my-apps/ # Applications ApplicationSet
132-
│ ├── ai/ # AI tools
133-
│ ├── media/ # Media servers
134-
│ └── myapplications-appset.yaml
135-
└── docs/ # Documentation
136-
```
137-
13898
## 🚀 Quick Start
13999

140100
### 1. System Dependencies
141101
```bash
142-
# On your workstation
143-
brew install talosctl sops yq kubectl
102+
# On your macOS workstation using Homebrew
103+
brew install talosctl sops yq kubectl kustomize
144104
brew install budimanjojo/tap/talhelper
145-
# Or see Talos/Talhelper docs for Linux/Windows
105+
106+
# For Linux/Windows, please see the official installation docs for each tool.
146107
```
147108

148-
### 2. Generate Talos Configs (with Talhelper)
109+
### 2. Generate Talos Configs
149110
```bash
111+
# Navigate to the Talos configuration directory
150112
cd iac/talos
151-
# Edit talconfig.yaml for your cluster topology
152-
# Generate secrets (encrypted with SOPS)
113+
114+
# Edit talconfig.yaml to match your cluster topology and node IPs
115+
# Then, generate the encrypted secrets file
153116
talhelper gensecret > talsecret.sops.yaml
154-
sops -e -i talsecret.sops.yaml
155-
# Generate node configs
117+
118+
# IMPORTANT: You must encrypt the file with SOPS for Talos to use it
119+
sops --encrypt --in-place talsecret.sops.yaml
120+
121+
# Generate the machine configs for each node
156122
talhelper genconfig
157123
```
158124

159125
### 3. Boot & Bootstrap Talos Nodes
160-
- Boot each VM/host with the generated Talos `machine.yaml` (PXE, ISO, or cloud-init)
161-
- Use `talosctl` to bootstrap the control plane:
126+
- Boot each VM or bare-metal host with its corresponding generated ISO from `iac/talos/clusterconfig/`.
127+
- Set your `TALOSCONFIG` and `KUBECONFIG` environment variables to point to the generated files.
162128
```bash
163-
# Set kubeconfig and talosconfig env vars
164-
export TALOSCONFIG=./clusterconfig/talosconfig
165-
export KUBECONFIG=./clusterconfig/kubeconfig
166-
# Bootstrap the cluster
167-
# (Run ONCE, on a single control plane node)
129+
# Set environment variables for your shell session
130+
export TALOSCONFIG=./iac/talos/clusterconfig/talosconfig
131+
export KUBECONFIG=./iac/talos/clusterconfig/kubeconfig
132+
```
133+
- Bootstrap the cluster on a **single control plane node**.
134+
```bash
135+
# Run ONCE on a single control plane node IP
168136
talosctl bootstrap --nodes <control-plane-ip>
169137
```
170-
171-
### 4. Apply Machine Configs
138+
- Apply the machine configuration to all nodes in the cluster.
172139
```bash
173-
# Apply config to all nodes
174-
talosctl apply-config --insecure --nodes <node-ip> --file clusterconfig/<node>.yaml
140+
talosctl apply-config --insecure --nodes <node-ip-1> --file iac/talos/clusterconfig/<node-1-name>.yaml
141+
talosctl apply-config --insecure --nodes <node-ip-2> --file iac/talos/clusterconfig/<node-2-name>.yaml
142+
# ... repeat for all nodes
175143
```
176144

177-
### 5. Install Gateway API CRDs
145+
### 4. Install Gateway API CRDs
146+
This is a prerequisite for Cilium's Gateway API integration.
178147
```bash
179148
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/standard-install.yaml
180149
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/experimental-install.yaml
181150
```
182151

183-
### 6. Bootstrap ArgoCD (Following k3s-argocd-starter Pattern)
152+
### 5. Configure Secret Management
153+
This cluster uses [1Password Connect](https://developer.1password.com/docs/connect) and [External Secrets Operator](https://external-secrets.io/) to manage secrets.
154+
155+
1. **Generate 1Password Connect Credentials**: Follow the [1Password documentation](https://developer.1password.com/docs/connect/get-started#step-2-deploy-the-1password-connect-server) to generate your `1password-credentials.json` file and your access token.
156+
157+
2. **Create Namespaces**:
158+
```bash
159+
kubectl create namespace 1passwordconnect
160+
kubectl create namespace external-secrets
161+
```
162+
163+
3. **Create Kubernetes Secrets**:
164+
```bash
165+
# IMPORTANT: Place your generated `1password-credentials.json` in the root of this repository first.
166+
kubectl create secret generic 1password-credentials \
167+
--from-file=1password-credentials.json \
168+
--namespace 1passwordconnect
184169
185-
This cluster uses a **proven GitOps bootstrap pattern** that ensures stability and avoids common race conditions. The process is carefully ordered:
170+
# Replace YOUR_CONNECT_TOKEN with your actual token
171+
export CONNECT_TOKEN="YOUR_CONNECT_TOKEN"
186172
187-
1. **Install CRDs First**: We use `kustomize` to apply the base ArgoCD Helm chart, which safely installs the necessary Custom Resource Definitions (CRDs) into the cluster.
188-
2. **Bootstrap Self-Management**: With the CRDs in place, we apply the `projects.yaml` and the root `argocd-app.yaml`. This tells the running ArgoCD instance to take over its own management from Git.
189-
3. **Deploy ApplicationSets**: Once ArgoCD is self-managing, we deploy the three ApplicationSets, which then automatically discover and deploy all other applications and components.
173+
kubectl create secret generic 1password-operator-token \
174+
--from-literal=token=$CONNECT_TOKEN \
175+
--namespace 1passwordconnect
190176
191-
This method prevents errors by ensuring resources are created only after their definitions are available in the cluster.
177+
kubectl create secret generic 1passwordconnect \
178+
--from-literal=token=$CONNECT_TOKEN \
179+
--namespace external-secrets
180+
```
192181

193-
Deploy ArgoCD and ApplicationSets in the correct order:
182+
### 6. Bootstrap ArgoCD & Deploy The Stack
183+
This final step uses a carefully ordered process to install ArgoCD and then deploy every other component.
194184

195185
```bash
196-
# Step 1: Install ArgoCD Components & CRDs
197-
# This uses kustomize to install the ArgoCD helm chart, which includes the CRDs.
198-
kubectl apply -k infrastructure/controllers/argocd
186+
# 1. Install ArgoCD Components & CRDs
187+
# This uses kustomize to build the official Helm chart and apply the manifests.
188+
kustomize build infrastructure/controllers/argocd --enable-helm | kubectl apply -f -
199189
200-
# Wait for ArgoCD to be ready (2-5 minutes)
190+
# 2. Wait for ArgoCD to be ready (2-5 minutes)
201191
kubectl wait --for=condition=Available deployment/argocd-server -n argocd --timeout=300s
202192
203-
# Step 2: Bootstrap ArgoCD to Manage Itself and Create Projects
193+
# 3. Bootstrap ArgoCD to Manage Itself and Create Projects
204194
# Now that ArgoCD is running, we apply the Application resource that tells
205195
# ArgoCD to manage its own installation from Git. We also apply the projects.
206196
kubectl apply -f infrastructure/controllers/argocd/projects.yaml
207197
kubectl apply -f infrastructure/argocd-app.yaml
208198
209-
# Step 3: Deploy ApplicationSets
210-
# With ArgoCD managing itself, we can deploy the ApplicationSets.
199+
# 4. Deploy the ApplicationSets
200+
# With ArgoCD managing itself, we can now deploy the ApplicationSets,
201+
# which will discover and sync all other applications.
211202
kubectl apply -f infrastructure/infrastructure-components-appset.yaml
212203
kubectl apply -f monitoring/monitoring-components-appset.yaml
213204
kubectl apply -f my-apps/myapplications-appset.yaml
214205
```
206+
**That's it!** ArgoCD will now build the entire cluster state from the Git repository.
215207
216-
**That's it!** ArgoCD will now:
217-
- Manage its own installation and upgrades
218-
- Deploy all infrastructure components (Cilium, storage, etc.)
219-
- Deploy monitoring stack (Prometheus, Grafana, Loki)
220-
- Deploy all applications (media, AI, home automation, etc.)
208+
## 🔍 Verification
209+
After the final step, you can monitor the deployment and verify that everything is working correctly.
221210
222-
### 7. Configure Secret Management
223211
```bash
224-
# Create required namespaces
225-
kubectl create namespace 1passwordconnect
226-
kubectl create namespace external-secrets
227-
228-
# Generate and apply 1Password Connect credentials
229-
# This command creates 1password-credentials.json
230-
op connect server create
231-
export CONNECT_TOKEN="your-1password-connect-token"
232-
233-
# Create required secrets
234-
kubectl create secret generic 1password-credentials \
235-
--from-file=1password-credentials.json=1password-credentials.base64 \
236-
--namespace 1passwordconnect
237-
238-
kubectl create secret generic 1password-operator-token \
239-
--from-literal=token=$CONNECT_TOKEN \
240-
--namespace 1passwordconnect
241-
242-
kubectl create secret generic 1passwordconnect \
243-
--from-literal=token=$CONNECT_TOKEN \
244-
--namespace external-secrets
212+
# Check Talos node health (run for each node)
213+
talosctl health --nodes <node-ip>
214+
215+
# Watch ArgoCD sync status
216+
# The `STATUS` column should eventually show `Synced` for all applications
217+
kubectl get applications -n argocd -w
218+
219+
# Verify all pods are running across the cluster
220+
# It may take 10-15 minutes for all images to pull and pods to become Ready.
221+
kubectl get pods -A
222+
223+
# Check that secrets have been populated by External Secrets
224+
kubectl get externalsecret -A
225+
# You should see secrets like `cloudflare-api-credentials` in the `cert-manager` namespace
226+
227+
# Verify the Longhorn UI is accessible and backups are configured
228+
kubectl get backuptarget -n longhorn-system
245229
```
246230
247231
## 🛡️ Talos-Specific Notes
@@ -348,37 +332,6 @@ Automated backups are configured with different tiers:
348332
| **Important** | Every 4 hours | Daily (3 AM) | 14 days |
349333
| **Standard** | Daily | Weekly | 4 weeks |
350334
351-
## 🔍 Verification
352-
```bash
353-
# Check Talos node health
354-
talosctl health --nodes <node-ip>
355-
356-
# Check Kubernetes core components
357-
kubectl get pods -A
358-
cilium status
359-
360-
# Check ArgoCD self-management
361-
kubectl get applications -n argocd
362-
kubectl get applicationsets -n argocd
363-
364-
# Check generated applications
365-
kubectl get applications -n argocd -l type=infrastructure
366-
kubectl get applications -n argocd -l type=monitoring
367-
kubectl get applications -n argocd -l type=application
368-
369-
# Check secrets
370-
kubectl get pods -n 1passwordconnect
371-
kubectl get externalsecret -A
372-
373-
# Verify Longhorn backup configuration
374-
kubectl get backuptarget -n longhorn-system
375-
kubectl get secret longhorn-backup-credentials -n longhorn-system
376-
377-
# Test MinIO connectivity from cluster
378-
kubectl run -it --rm debug --image=minio/mc --restart=Never -- \
379-
mc alias set test http://192.168.10.133:9000 <access-key> <secret-key>
380-
```
381-
382335
## 📋 Documentation
383336
- **[View Documentation Online](https://mitchross.github.io/k3s-argocd-proxmox)** - Full documentation website
384337
- **[Local Documentation](docs/)** - Browse documentation in the repository:

infrastructure/controllers/argocd/values.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -214,8 +214,6 @@ repoServer:
214214
# Plugin-specific environment
215215
- name: KUSTOMIZE_PLUGIN_HOME
216216
value: /tmp/kustomize-plugins
217-
- name: HELM_CACHE_HOME
218-
value: /tmp/helm-cache
219217
volumeMounts:
220218
- name: plugins
221219
mountPath: /home/argocd/cmp-server/plugins

0 commit comments

Comments
 (0)