Skip to content

Commit 8c980a6

Browse files
committed
Add pvc-plumber backup checker and update ArgoCD waves
Introduces the pvc-plumber service for checking PVC backup existence, adds its ArgoCD application manifest, and updates kustomization and sync wave ordering to ensure pvc-plumber is deployed before Kyverno. Also adds documentation for the zero-touch PVC backup and restore system.
1 parent bdfdd19 commit 8c980a6

4 files changed

Lines changed: 206 additions & 2 deletions

File tree

docs/backup-restore.md

Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
# Zero-Touch PVC Backup and Restore
2+
3+
This document describes the automated backup and restore system for Kubernetes PersistentVolumeClaims (PVCs).
4+
5+
## Overview
6+
7+
The system automatically backs up PVCs to S3-compatible storage (RustFS/MinIO) and restores them on disaster recovery or app re-deployment. It uses a "look-before-you-leap" pattern to conditionally restore only when backups exist.
8+
9+
## Architecture
10+
11+
```
12+
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
13+
│ 1Password │────▶│ External Secrets│────▶│ Secrets │
14+
│ (rustfs) │ │ Operator │ │ (per-PVC) │
15+
└─────────────────┘ └─────────────────┘ └─────────────────┘
16+
17+
┌─────────────────┐ ┌─────────────────┐ │
18+
│ pvc-plumber │◀────│ Kyverno │◀───────────┘
19+
│ (backup check) │ │ ClusterPolicy │
20+
└────────┬────────┘ └────────┬────────┘
21+
│ │
22+
▼ ▼
23+
┌─────────────────┐ ┌─────────────────┐
24+
│ RustFS S3 │ │ VolSync │
25+
│ volsync-backup │◀────│ ReplicationSrc │
26+
└─────────────────┘ │ ReplicationDst │
27+
└─────────────────┘
28+
```
29+
30+
## Components
31+
32+
### 1. RustFS S3 Storage
33+
- **Endpoint:** `http://192.168.10.133:30292`
34+
- **Bucket:** `volsync-backup`
35+
- **Access Key:** `k8s-admin` (stored in 1Password `rustfs` item)
36+
37+
### 2. pvc-plumber Service
38+
- Lightweight Go service that checks if backups exist in S3
39+
- Endpoint: `http://pvc-plumber.volsync-system.svc.cluster.local/exists/{namespace}/{pvc-name}`
40+
- Returns: `{"exists": true/false}`
41+
- Deployed at sync wave 2 (before Kyverno)
42+
43+
### 3. Kyverno ClusterPolicy
44+
- Triggers on PVCs with label `backup: hourly` or `backup: daily`
45+
- Calls pvc-plumber to check for existing backups
46+
- Generates:
47+
- ExternalSecret (per-PVC S3 credentials)
48+
- ReplicationSource (backup schedule)
49+
- ReplicationDestination (restore capability)
50+
- If backup exists: mutates PVC with `dataSourceRef` for auto-restore
51+
52+
### 4. VolSync
53+
- Performs actual backup/restore operations using Restic
54+
- Uses Longhorn snapshots for consistent backups
55+
- Stores data in S3 with Restic encryption
56+
57+
## Sync Wave Order
58+
59+
| Wave | Component | Purpose |
60+
|------|-----------|---------|
61+
| 0 | 1Password Connect, External Secrets | Secret management foundation |
62+
| 1 | Longhorn, VolSync, Snapshot Controller | Storage foundation |
63+
| 2 | pvc-plumber | Backup existence checker |
64+
| 4 | Kyverno | Policy engine (calls pvc-plumber) |
65+
| 6 | My Apps | Application workloads with PVCs |
66+
67+
## How to Enable Backup for a PVC
68+
69+
Add a backup label to your PVC:
70+
71+
```yaml
72+
apiVersion: v1
73+
kind: PersistentVolumeClaim
74+
metadata:
75+
name: my-data
76+
namespace: my-app
77+
labels:
78+
backup: "hourly" # Backups every hour
79+
# OR
80+
backup: "daily" # Backups at 2am daily
81+
spec:
82+
accessModes:
83+
- ReadWriteOnce
84+
storageClassName: longhorn
85+
resources:
86+
requests:
87+
storage: 10Gi
88+
```
89+
90+
## Backup Schedules
91+
92+
| Label | Schedule | Retention |
93+
|-------|----------|-----------|
94+
| `backup: hourly` | Every hour (0 * * * *) | 24 hourly, 7 daily, 4 weekly, 2 monthly |
95+
| `backup: daily` | 2am daily (0 2 * * *) | 24 hourly, 7 daily, 4 weekly, 2 monthly |
96+
97+
## Scenario Behavior
98+
99+
### Fresh Cluster (No Backups)
100+
1. PVC created with backup label
101+
2. Kyverno calls pvc-plumber → no backup found
102+
3. PVC created normally (empty)
103+
4. Backup schedule begins
104+
105+
### Disaster Recovery (Backups Exist)
106+
1. PVC created with backup label
107+
2. Kyverno calls pvc-plumber → backup found
108+
3. Kyverno adds `dataSourceRef` to PVC
109+
4. VolSync VolumePopulator restores data
110+
5. PVC bound with restored data
111+
112+
### App Re-deployment
113+
Same as disaster recovery - existing backups are automatically restored.
114+
115+
## 1Password Configuration
116+
117+
The `rustfs` item in 1Password must contain:
118+
119+
| Field | Example Value | Purpose |
120+
|-------|--------------|---------|
121+
| `k8s-admin-access-key` | `k8s-admin` | S3 access key ID |
122+
| `k8s-admin-secret-key` | (secret) | S3 secret access key |
123+
| `restic_password` | (password) | Restic encryption key |
124+
| `restic_repository` | `s3:http://192.168.10.133:30292/volsync-backup/` | Base S3 path |
125+
| `endpoint` | `http://192.168.10.133:30292` | S3 endpoint (for pvc-plumber) |
126+
| `bucket` | `volsync-backup` | S3 bucket (for pvc-plumber) |
127+
128+
## S3 Bucket Structure
129+
130+
```
131+
volsync-backup/
132+
├── {namespace}/
133+
│ └── {pvc-name}/
134+
│ ├── config # Restic repository config
135+
│ ├── data/ # Deduplicated backup data
136+
│ ├── index/ # Restic index files
137+
│ ├── keys/ # Encryption keys
138+
│ ├── locks/ # Lock files
139+
│ └── snapshots/ # Snapshot metadata
140+
```
141+
142+
## Troubleshooting
143+
144+
### PVC Stuck in Pending
145+
1. Check if ReplicationDestination exists: `kubectl get replicationdestination -n <namespace>`
146+
2. Check pvc-plumber logs: `kubectl logs -n volsync-system -l app.kubernetes.io/name=pvc-plumber`
147+
3. Check VolSync mover pod: `kubectl get pods -n <namespace> | grep volsync`
148+
149+
### Backup Not Running
150+
1. Check ReplicationSource: `kubectl get replicationsource -n <namespace>`
151+
2. Check secret exists: `kubectl get secret -n <namespace> | grep volsync`
152+
3. Check ExternalSecret status: `kubectl get externalsecret -n <namespace>`
153+
154+
### Test pvc-plumber
155+
```bash
156+
kubectl port-forward -n volsync-system svc/pvc-plumber 8080:80
157+
curl http://localhost:8080/exists/karakeep/data-pvc
158+
# Expected: {"exists":true} or {"exists":false}
159+
```
160+
161+
## Excluded Namespaces
162+
163+
The following namespaces are excluded from automatic backup:
164+
- `kube-system`
165+
- `volsync-system`
166+
- `kyverno`
167+
168+
## Files
169+
170+
| File | Purpose |
171+
|------|---------|
172+
| `infrastructure/controllers/pvc-plumber/` | Backup existence checker service |
173+
| `infrastructure/controllers/kyverno/policies/volsync-pvc-backup-restore.yaml` | Kyverno policy |
174+
| `infrastructure/storage/volsync/` | VolSync Helm chart + VolumeSnapshotClass |
175+
| `infrastructure/controllers/argocd/apps/pvc-plumber-app.yaml` | ArgoCD Application |

infrastructure/controllers/argocd/apps/infrastructure-appset.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@ spec:
2020
- path: infrastructure/controllers/nvidia-gpu-operator
2121
- path: infrastructure/controllers/postgres-operator
2222
- path: infrastructure/controllers/reloader
23-
- path: infrastructure/controllers/pvc-plumber
2423
- path: infrastructure/controllers/kyverno
2524
- path: infrastructure/networking/cloudflared
2625
- path: infrastructure/networking/coredns

infrastructure/controllers/argocd/apps/kustomization.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,8 @@ resources:
1010
- longhorn-app.yaml # Wave 1 - Storage foundation
1111
- snapshot-controller-app.yaml # Wave 1 - VolumeSnapshot controller + CRDs
1212
- volsync-app.yaml # Wave 1 - PVC backup and replication
13+
- pvc-plumber-app.yaml # Wave 2 - Backup existence checker (before Kyverno)
1314
# ApplicationSets for automatic discovery
14-
- infrastructure-appset.yaml # Wave 2
15+
- infrastructure-appset.yaml # Wave 4
1516
- monitoring-appset.yaml # Wave 3
1617
- my-apps-appset.yaml # Wave 4
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
# pvc-plumber must deploy BEFORE Kyverno (wave 4) so it's ready
3+
# when Kyverno policies try to call it for backup existence checks
4+
apiVersion: argoproj.io/v1alpha1
5+
kind: Application
6+
metadata:
7+
name: pvc-plumber
8+
namespace: argocd
9+
annotations:
10+
argocd.argoproj.io/sync-wave: "2"
11+
finalizers:
12+
- resources-finalizer.argocd.argoproj.io
13+
spec:
14+
project: infrastructure
15+
revisionHistoryLimit: 3
16+
source:
17+
repoURL: https://github.com/mitchross/talos-argocd-proxmox.git
18+
targetRevision: main
19+
path: infrastructure/controllers/pvc-plumber
20+
destination:
21+
server: https://kubernetes.default.svc
22+
namespace: volsync-system
23+
syncPolicy:
24+
automated:
25+
prune: true
26+
selfHeal: true
27+
syncOptions:
28+
- CreateNamespace=true
29+
- ServerSideApply=true

0 commit comments

Comments
 (0)