Skip to content

Commit 5b11e70

Browse files
committed
some doc clean up
1 parent 91e7bc7 commit 5b11e70

12 files changed

Lines changed: 965 additions & 1188 deletions

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -344,7 +344,7 @@ Automated backups are configured with different tiers:
344344
- [ArgoCD Setup](docs/argocd.md) - **Enterprise GitOps patterns and self-management**
345345
- [Network Configuration](docs/network.md)
346346
- [Storage Configuration](docs/storage.md)
347-
- [**Longhorn Backup & Disaster Recovery**](docs/longhorn-backup-guide.md) 🗄️ - **TrueNAS Scale integration**
347+
- [**Backup & Recovery**](docs/backup-recovery.md) 🗄️ - **Simple backup and disaster recovery guide**
348348
- [Security Setup](docs/security.md)
349349
- [GPU Configuration](docs/gpu.md)
350350
- [External Services](docs/external-services.md)

docs/backup-recovery.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# 💾 Longhorn Backup & Recovery
2+
3+
**Simple backup and disaster recovery for your K3s cluster**
4+
5+
---
6+
7+
## 🚀 Quick Actions
8+
9+
### Backup Everything NOW
10+
```bash
11+
./scripts/trigger-immediate-backups.sh
12+
```
13+
14+
### Check Backup Status
15+
```bash
16+
kubectl get backups.longhorn.io -n longhorn-system | tail -10
17+
```
18+
19+
### Disaster Recovery (Fresh K3s Cluster)
20+
```bash
21+
# 1. Deploy Longhorn
22+
kubectl apply -f infrastructure/storage/longhorn/
23+
24+
# 2. Wait for it to be ready
25+
kubectl wait --for=condition=Available deployment/longhorn-ui -n longhorn-system --timeout=600s
26+
27+
# 3. Restore all data
28+
./scripts/restore-from-backups.sh
29+
30+
# 4. Create volume bridges
31+
./scripts/update-pvcs-for-restore.sh
32+
33+
# 5. Deploy ArgoCD
34+
kustomize build infrastructure/controllers/argocd --enable-helm | kubectl apply -f -
35+
kubectl wait --for=condition=Available deployment/argocd-server -n argocd --timeout=300s
36+
kubectl apply -f infrastructure/controllers/argocd/root.yaml
37+
```
38+
39+
---
40+
41+
## 📊 Backup Tiers
42+
43+
| **Tier** | **Apps** | **Frequency** | **Retention** |
44+
|----------|----------|---------------|---------------|
45+
| **Critical** | Paperless, Redis, Registry | Hourly snapshots + Daily backups | 30 days |
46+
| **Important** | Khoj, Ollama, Home Assistant, Grafana | 4-hour snapshots + Daily backups | 14 days |
47+
| **Standard** | Homepage, Cache, Logs | Daily snapshots + Weekly backups | 4 weeks |
48+
49+
---
50+
51+
## 🛠️ Common Tasks
52+
53+
### Single App Restore
54+
```bash
55+
# Scale down app
56+
kubectl scale deployment/app-name --replicas=0 -n namespace
57+
58+
# Use Longhorn UI to restore volume:
59+
# 1. Delete old volume
60+
# 2. Restore from backup (same name!)
61+
# 3. Create PV/PVC (check "Use Previous PVC")
62+
63+
# Scale back up
64+
kubectl scale deployment/app-name --replicas=1 -n namespace
65+
```
66+
67+
### Check Backup Health
68+
```bash
69+
kubectl get recurringjobs.longhorn.io -n longhorn-system
70+
kubectl get backups.longhorn.io -n longhorn-system | grep $(date +%Y-%m-%d)
71+
```
72+
73+
### Access UIs
74+
- **Longhorn**: `http://longhorn.local`
75+
- **MinIO**: `http://192.168.10.133:9002`
76+
- **ArgoCD**: `http://argocd.local`
77+
78+
---
79+
80+
## 🚨 Emergency Recovery Timeline
81+
82+
| Step | Time | Command |
83+
|------|------|---------|
84+
| Fresh K3s cluster | - | Your Talos deployment |
85+
| Deploy Longhorn | 5-10 min | `kubectl apply -f infrastructure/storage/longhorn/` |
86+
| Restore data | 15-30 min | `./scripts/restore-from-backups.sh` |
87+
| Volume bridges | 1-2 min | `./scripts/update-pvcs-for-restore.sh` |
88+
| Deploy apps | 3-5 min | ArgoCD bootstrap commands |
89+
90+
**Total: ~25-45 minutes** to full recovery
91+
92+
---
93+
94+
## ✅ Success Check
95+
96+
All good when these return mostly 1s (just headers):
97+
```bash
98+
kubectl get pods -A | grep -v Running | wc -l # Should be 1
99+
kubectl get pvc -A | grep -v Bound | wc -l # Should be 1
100+
kubectl get applications -n argocd | grep -v Synced | wc -l # Should be 1
101+
```
102+
103+
---
104+
105+
**That's it!** Backups run automatically. For disasters, follow the 5-step recovery process. 🎯

docs/longhorn-backup-configuration.md

Lines changed: 0 additions & 180 deletions
This file was deleted.

0 commit comments

Comments
 (0)