Skip to content

Commit 91e7bc7

Browse files
committed
Add Longhorn backup annotations and scripts
Annotated all relevant PVCs and Helm values with Longhorn backup tier settings for critical, important, and standard data. Added documentation summarizing backup configuration and schedules. Introduced scripts to automate annotation of PVCs and verify backup health and connectivity.
1 parent e8ea4d8 commit 91e7bc7

10 files changed

Lines changed: 618 additions & 0 deletions

File tree

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# Longhorn Backup Configuration Summary
2+
3+
Generated on: September 5, 2025
4+
5+
## Overview
6+
7+
Your k3s-argocd-proxmox cluster now has comprehensive Longhorn backup coverage with MinIO S3 storage on TrueNAS. All PVCs using `storageClassName: longhorn` are configured for automatic backups.
8+
9+
## Backup Infrastructure
10+
11+
### S3 Backend Configuration
12+
- **Target**: `s3://longhorn-backups@us-east-1/`
13+
- **Storage**: MinIO on TrueNAS at `192.168.10.133`
14+
- **Credentials**: Managed via External Secrets (1Password)
15+
- **Compression**: gzip
16+
- **Concurrent Limit**: 2 backups at once
17+
18+
### Backup Tiers & Schedules
19+
20+
#### 🔴 Critical Tier (Databases, Core Infrastructure)
21+
- **Snapshots**: Every hour (24 retained = 1 day)
22+
- **Backups**: Daily at 2 AM (30 retained = 1 month)
23+
- **Applications**: Redis, PostgreSQL, Container Registry
24+
25+
#### 🟡 Important Tier (User Data, Configurations)
26+
- **Snapshots**: Every 4 hours (12 retained = 2 days)
27+
- **Backups**: Daily at 3 AM (14 retained = 2 weeks)
28+
- **Applications**: Immich, Home Assistant, Paperless-NGX, AI workloads, Monitoring
29+
30+
#### 🔵 Standard Tier (Cache, Logs, Development)
31+
- **Snapshots**: Daily at 4 AM (7 retained = 1 week)
32+
- **Backups**: Weekly on Sunday at 5 AM (4 retained = 1 month)
33+
- **Applications**: Jellyfin config, Development tools, Cache storage
34+
35+
#### 🌐 Weekly Full System Backup
36+
- **Schedule**: Sunday at 1 AM
37+
- **Scope**: ALL volumes (critical + important + standard)
38+
- **Retention**: 8 weeks (2 months)
39+
40+
## PVC Backup Configuration by Application
41+
42+
### Infrastructure Components
43+
44+
| PVC Name | Namespace | Storage | Backup Tier | Application |
45+
|----------|-----------|---------|-------------|-------------|
46+
| `registry-pvc` | kube-system | 10Gi | 🔴 Critical | Container Registry |
47+
| `redis-data-redis-master-0` | redis-instance | 10Gi | 🔴 Critical | Redis Database |
48+
49+
### Monitoring Stack
50+
51+
| PVC Name | Namespace | Storage | Backup Tier | Application |
52+
|----------|-----------|---------|-------------|-------------|
53+
| Prometheus PVC | monitoring | 20Gi | 🟡 Important | Prometheus Metrics |
54+
| Alertmanager PVC | monitoring | 2Gi | 🟡 Important | Alert Management |
55+
| Grafana PVC | monitoring | 5Gi | 🟡 Important | Dashboard Data |
56+
57+
### AI Applications
58+
59+
| PVC Name | Namespace | Storage | Backup Tier | Application |
60+
|----------|-----------|---------|-------------|-------------|
61+
| `khoj-data` | khoj | 10Gi | 🟡 Important | AI Assistant Data |
62+
| `khoj-postgres-data` | khoj | 5Gi | 🟡 Important | AI Assistant DB |
63+
| `ollama-webui-data` | ollama-webui | 5Gi | 🟡 Important | Chat Interface |
64+
| `ollama-webui-storage-pvc` | ollama-webui | 5Gi | 🟡 Important | Chat Storage |
65+
66+
### Home Automation
67+
68+
| PVC Name | Namespace | Storage | Backup Tier | Application |
69+
|----------|-----------|---------|-------------|-------------|
70+
| `home-assistant-config` | home-assistant | 10Gi | 🟡 Important | Smart Home Config |
71+
| `frigate-config-pvc` | frigate | 5Gi | 🟡 Important | Video Surveillance |
72+
| `mqtt-data-pvc` | frigate | 1Gi | 🟡 Important | MQTT Broker |
73+
| `paperless-data-pvc` | paperless-ngx | 10Gi | 🟡 Important | Document Data |
74+
| `paperless-media-pvc` | paperless-ngx | 20Gi | 🟡 Important | Document Media |
75+
| `paperless-consume-pvc` | paperless-ngx | 5Gi | 🟡 Important | Document Intake |
76+
| `paperless-export-pvc` | paperless-ngx | 5Gi | 🟡 Important | Document Export |
77+
78+
### Media Applications
79+
80+
| PVC Name | Namespace | Storage | Backup Tier | Application |
81+
|----------|-----------|---------|-------------|-------------|
82+
| `immich-data` | immich | 20Gi | 🟡 Important | Photo Management |
83+
| `immich-library` | immich | 100Gi | 🟡 Important | Photo Library |
84+
| `immich-cache` | immich | 10Gi | 🟡 Important | ML Cache |
85+
| `plex-config` | plex | 10Gi | 🟡 Important | Media Server Config |
86+
| `plex-transcode` | plex | 10Gi | 🔵 Standard | Transcode Cache |
87+
| `plex-logs` | plex | 1Gi | 🔵 Standard | Application Logs |
88+
| `jellyfin-config-pvc` | jellyfin | 1Gi | 🔵 Standard | Media Server Config |
89+
| `data-pvc` | hoarder | 10Gi | 🟡 Important | Bookmark Data |
90+
| `meilisearch-pvc` | hoarder | 10Gi | 🔵 Standard | Search Index |
91+
| `homepage-config-pvc` | homepage-dashboard | 1Gi | 🔵 Standard | Dashboard Config |
92+
| `tubearchivist-cache-pvc` | tubearchivist | 50Gi | 🔵 Standard | Video Cache |
93+
| `tubearchivist-redis-pvc` | tubearchivist | 1Gi | 🔵 Standard | Cache Database |
94+
| `es-data` | tubearchivist | 50Gi | 🟡 Important | Search Data |
95+
| `nestmtx-storage-pvc` | nestmtx | 10Gi | 🔵 Standard | Streaming Cache |
96+
97+
### Privacy Applications
98+
99+
| PVC Name | Namespace | Storage | Backup Tier | Application |
100+
|----------|-----------|---------|-------------|-------------|
101+
| `proxitok-cache-pvc` | proxitok | 10Gi | 🔵 Standard | TikTok Proxy Cache |
102+
| `redis-data-pvc` | searxng | 1Gi | 🔵 Standard | Search Cache |
103+
104+
### Development Tools
105+
106+
| PVC Name | Namespace | Storage | Backup Tier | Application |
107+
|----------|-----------|---------|-------------|-------------|
108+
| `nginx-storage` | nginx | 1Gi | 🔵 Standard | Web Server Config |
109+
110+
## MinIO S3 Bucket Structure
111+
112+
```
113+
longhorn-backups/
114+
├── backupstore/
115+
│ ├── volumes/
116+
│ │ ├── <volume-name>/
117+
│ │ │ ├── backups/
118+
│ │ │ │ ├── backup-<timestamp>/
119+
│ │ │ │ └── backup-<timestamp>/
120+
│ │ │ └── volume.cfg
121+
│ │ └── ...
122+
│ └── backup_volumes.cfg
123+
```
124+
125+
## Monitoring & Verification
126+
127+
### Key Commands
128+
129+
```bash
130+
# Check backup system health
131+
./scripts/verify-longhorn-backups.sh
132+
133+
# View all backups
134+
kubectl get backups -n longhorn-system
135+
136+
# Check backup target status
137+
kubectl get backuptarget -n longhorn-system
138+
139+
# Monitor recurring jobs
140+
kubectl get recurringjobs -n longhorn-system
141+
142+
# View volume backup status
143+
kubectl get volumes -n longhorn-system
144+
```
145+
146+
### Web Interfaces
147+
148+
- **Longhorn UI**: Access via HTTPRoute at your cluster domain
149+
- **MinIO Console**: `http://192.168.10.133:9002`
150+
- **TrueNAS**: `http://192.168.10.133` (main interface)
151+
152+
## Backup Verification Checklist
153+
154+
- [ ] All Longhorn PVCs have `longhorn.io/recurring-job-*` annotations
155+
- [ ] BackupTarget shows as "Available"
156+
- [ ] External Secrets are syncing credentials
157+
- [ ] MinIO bucket `longhorn-backups` exists and is accessible
158+
- [ ] Recent backups are appearing in `kubectl get backups`
159+
- [ ] No backup job failures in recurring job status
160+
161+
## Emergency Procedures
162+
163+
For disaster recovery procedures, see:
164+
- `docs/runbooks/longhorn-emergency-procedures.md`
165+
- `docs/longhorn-backup-guide.md`
166+
167+
## Next Steps
168+
169+
1. **Verify Configuration**: Run `./scripts/verify-longhorn-backups.sh`
170+
2. **Test Restore**: Practice restoring a volume from backup
171+
3. **Monitor**: Set up alerts for backup failures
172+
4. **Document**: Update any application-specific restore procedures
173+
174+
---
175+
176+
**Last Updated**: September 5, 2025
177+
**Configuration Files**:
178+
- `infrastructure/storage/longhorn/backup-settings.yaml`
179+
- `infrastructure/storage/longhorn/recurring-jobs.yaml`
180+
- All PVC files with `storageClassName: longhorn`

monitoring/prometheus-stack/values.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,12 @@ prometheus:
1919
# Storage configuration with Longhorn
2020
storageSpec:
2121
volumeClaimTemplate:
22+
metadata:
23+
annotations:
24+
# Longhorn backup settings - Important tier for monitoring data
25+
longhorn.io/recurring-job-source: enabled
26+
longhorn.io/recurring-job-group: important
27+
volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
2228
spec:
2329
storageClassName: longhorn
2430
accessModes:
@@ -89,6 +95,12 @@ alertmanager:
8995
# Storage for alertmanager
9096
storage:
9197
volumeClaimTemplate:
98+
metadata:
99+
annotations:
100+
# Longhorn backup settings - Important tier for alerting data
101+
longhorn.io/recurring-job-source: enabled
102+
longhorn.io/recurring-job-group: important
103+
volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
92104
spec:
93105
storageClassName: longhorn
94106
accessModes:
@@ -127,6 +139,11 @@ grafana:
127139
size: 5Gi
128140
accessModes:
129141
- ReadWriteOnce
142+
annotations:
143+
# Longhorn backup settings - Important tier for dashboard data
144+
longhorn.io/recurring-job-source: enabled
145+
longhorn.io/recurring-job-group: important
146+
volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
130147
# Resource allocation
131148
resources:
132149
requests:

my-apps/home/frigate/mqtt/mqtt.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,11 @@ metadata:
1919
labels:
2020
app: mosquitto
2121
type: storage
22+
annotations:
23+
# Longhorn backup settings - Important tier for MQTT broker data
24+
longhorn.io/recurring-job-source: enabled
25+
longhorn.io/recurring-job-group: important
26+
volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
2227
spec:
2328
accessModes:
2429
- ReadWriteOnce

my-apps/media/homepage-dashboard/pvc.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,11 @@ metadata:
77
labels:
88
app: homepage
99
type: config
10+
annotations:
11+
# Longhorn backup settings - Standard tier for dashboard configuration
12+
longhorn.io/recurring-job-source: enabled
13+
longhorn.io/recurring-job-group: standard
14+
volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
1015
spec:
1116
accessModes:
1217
- ReadWriteOnce

my-apps/media/nestmtx/pvc.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,11 @@ metadata:
77
labels:
88
app: nestmtx
99
type: storage
10+
annotations:
11+
# Longhorn backup settings - Standard tier for streaming cache
12+
longhorn.io/recurring-job-source: enabled
13+
longhorn.io/recurring-job-group: standard
14+
volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
1015
spec:
1116
accessModes:
1217
- ReadWriteOnce

my-apps/media/tubearchivist/archivist-es/statefulset.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,11 @@ spec:
7979
name: es-data
8080
labels:
8181
app: archivist-es
82+
annotations:
83+
# Longhorn backup settings - Important tier for video archive search data
84+
longhorn.io/recurring-job-source: enabled
85+
longhorn.io/recurring-job-group: important
86+
volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
8287
spec:
8388
storageClassName: longhorn
8489
accessModes: [ "ReadWriteOnce" ]

my-apps/privacy/proxitok/pvc.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,11 @@ metadata:
66
labels:
77
app: proxitok
88
type: cache
9+
annotations:
10+
# Longhorn backup settings - Standard tier for cache data
11+
longhorn.io/recurring-job-source: enabled
12+
longhorn.io/recurring-job-group: standard
13+
volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
914
spec:
1015
accessModes:
1116
- ReadWriteOnce

my-apps/privacy/searxng/redis.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,11 @@ kind: PersistentVolumeClaim
6464
metadata:
6565
name: redis-data
6666
namespace: searxng
67+
annotations:
68+
# Longhorn backup settings - Standard tier for search cache
69+
longhorn.io/recurring-job-source: enabled
70+
longhorn.io/recurring-job-group: standard
71+
volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
6772
spec:
6873
accessModes:
6974
- ReadWriteOnce

0 commit comments

Comments
 (0)