|
| 1 | +# 💾 Longhorn Backup & Recovery |
| 2 | + |
| 3 | +**Simple backup and disaster recovery for your K3s cluster** |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## 🚀 Quick Actions |
| 8 | + |
| 9 | +### Backup Everything NOW |
| 10 | +```bash |
| 11 | +./scripts/trigger-immediate-backups.sh |
| 12 | +``` |
| 13 | + |
| 14 | +### Check Backup Status |
| 15 | +```bash |
| 16 | +kubectl get backups.longhorn.io -n longhorn-system | tail -10 |
| 17 | +``` |
| 18 | + |
| 19 | +### Disaster Recovery (Fresh K3s Cluster) |
| 20 | +```bash |
| 21 | +# 1. Deploy Longhorn |
| 22 | +kubectl apply -f infrastructure/storage/longhorn/ |
| 23 | + |
| 24 | +# 2. Wait for it to be ready |
| 25 | +kubectl wait --for=condition=Available deployment/longhorn-ui -n longhorn-system --timeout=600s |
| 26 | + |
| 27 | +# 3. Restore all data |
| 28 | +./scripts/restore-from-backups.sh |
| 29 | + |
| 30 | +# 4. Create volume bridges |
| 31 | +./scripts/update-pvcs-for-restore.sh |
| 32 | + |
| 33 | +# 5. Deploy ArgoCD |
| 34 | +kustomize build infrastructure/controllers/argocd --enable-helm | kubectl apply -f - |
| 35 | +kubectl wait --for=condition=Available deployment/argocd-server -n argocd --timeout=300s |
| 36 | +kubectl apply -f infrastructure/controllers/argocd/root.yaml |
| 37 | +``` |
| 38 | + |
| 39 | +--- |
| 40 | + |
| 41 | +## 📊 Backup Tiers |
| 42 | + |
| 43 | +| **Tier** | **Apps** | **Frequency** | **Retention** | |
| 44 | +|----------|----------|---------------|---------------| |
| 45 | +| **Critical** | Paperless, Redis, Registry | Hourly snapshots + Daily backups | 30 days | |
| 46 | +| **Important** | Khoj, Ollama, Home Assistant, Grafana | 4-hour snapshots + Daily backups | 14 days | |
| 47 | +| **Standard** | Homepage, Cache, Logs | Daily snapshots + Weekly backups | 4 weeks | |
| 48 | + |
| 49 | +--- |
| 50 | + |
| 51 | +## 🛠️ Common Tasks |
| 52 | + |
| 53 | +### Single App Restore |
| 54 | +```bash |
| 55 | +# Scale down app |
| 56 | +kubectl scale deployment/app-name --replicas=0 -n namespace |
| 57 | + |
| 58 | +# Use Longhorn UI to restore volume: |
| 59 | +# 1. Delete old volume |
| 60 | +# 2. Restore from backup (same name!) |
| 61 | +# 3. Create PV/PVC (check "Use Previous PVC") |
| 62 | + |
| 63 | +# Scale back up |
| 64 | +kubectl scale deployment/app-name --replicas=1 -n namespace |
| 65 | +``` |
| 66 | + |
| 67 | +### Check Backup Health |
| 68 | +```bash |
| 69 | +kubectl get recurringjobs.longhorn.io -n longhorn-system |
| 70 | +kubectl get backups.longhorn.io -n longhorn-system | grep $(date +%Y-%m-%d) |
| 71 | +``` |
| 72 | + |
| 73 | +### Access UIs |
| 74 | +- **Longhorn**: `http://longhorn.local` |
| 75 | +- **MinIO**: `http://192.168.10.133:9002` |
| 76 | +- **ArgoCD**: `http://argocd.local` |
| 77 | + |
| 78 | +--- |
| 79 | + |
| 80 | +## 🚨 Emergency Recovery Timeline |
| 81 | + |
| 82 | +| Step | Time | Command | |
| 83 | +|------|------|---------| |
| 84 | +| Fresh K3s cluster | - | Your Talos deployment | |
| 85 | +| Deploy Longhorn | 5-10 min | `kubectl apply -f infrastructure/storage/longhorn/` | |
| 86 | +| Restore data | 15-30 min | `./scripts/restore-from-backups.sh` | |
| 87 | +| Volume bridges | 1-2 min | `./scripts/update-pvcs-for-restore.sh` | |
| 88 | +| Deploy apps | 3-5 min | ArgoCD bootstrap commands | |
| 89 | + |
| 90 | +**Total: ~25-45 minutes** to full recovery |
| 91 | + |
| 92 | +--- |
| 93 | + |
| 94 | +## ✅ Success Check |
| 95 | + |
| 96 | +All good when these return mostly 1s (just headers): |
| 97 | +```bash |
| 98 | +kubectl get pods -A | grep -v Running | wc -l # Should be 1 |
| 99 | +kubectl get pvc -A | grep -v Bound | wc -l # Should be 1 |
| 100 | +kubectl get applications -n argocd | grep -v Synced | wc -l # Should be 1 |
| 101 | +``` |
| 102 | + |
| 103 | +--- |
| 104 | + |
| 105 | +**That's it!** Backups run automatically. For disasters, follow the 5-step recovery process. 🎯 |
0 commit comments