|
| 1 | +# Zero-Touch PVC Backup and Restore |
| 2 | + |
| 3 | +This document describes the automated backup and restore system for Kubernetes PersistentVolumeClaims (PVCs). |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The system automatically backs up PVCs to S3-compatible storage (RustFS/MinIO) and restores them on disaster recovery or app re-deployment. It uses a "look-before-you-leap" pattern to conditionally restore only when backups exist. |
| 8 | + |
| 9 | +## Architecture |
| 10 | + |
| 11 | +``` |
| 12 | +┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ |
| 13 | +│ 1Password │────▶│ External Secrets│────▶│ Secrets │ |
| 14 | +│ (rustfs) │ │ Operator │ │ (per-PVC) │ |
| 15 | +└─────────────────┘ └─────────────────┘ └─────────────────┘ |
| 16 | + │ |
| 17 | +┌─────────────────┐ ┌─────────────────┐ │ |
| 18 | +│ pvc-plumber │◀────│ Kyverno │◀───────────┘ |
| 19 | +│ (backup check) │ │ ClusterPolicy │ |
| 20 | +└────────┬────────┘ └────────┬────────┘ |
| 21 | + │ │ |
| 22 | + ▼ ▼ |
| 23 | +┌─────────────────┐ ┌─────────────────┐ |
| 24 | +│ RustFS S3 │ │ VolSync │ |
| 25 | +│ volsync-backup │◀────│ ReplicationSrc │ |
| 26 | +└─────────────────┘ │ ReplicationDst │ |
| 27 | + └─────────────────┘ |
| 28 | +``` |
| 29 | + |
| 30 | +## Components |
| 31 | + |
| 32 | +### 1. RustFS S3 Storage |
| 33 | +- **Endpoint:** `http://192.168.10.133:30292` |
| 34 | +- **Bucket:** `volsync-backup` |
| 35 | +- **Access Key:** `k8s-admin` (stored in 1Password `rustfs` item) |
| 36 | + |
| 37 | +### 2. pvc-plumber Service |
| 38 | +- Lightweight Go service that checks if backups exist in S3 |
| 39 | +- Endpoint: `http://pvc-plumber.volsync-system.svc.cluster.local/exists/{namespace}/{pvc-name}` |
| 40 | +- Returns: `{"exists": true/false}` |
| 41 | +- Deployed at sync wave 2 (before Kyverno) |
| 42 | + |
| 43 | +### 3. Kyverno ClusterPolicy |
| 44 | +- Triggers on PVCs with label `backup: hourly` or `backup: daily` |
| 45 | +- Calls pvc-plumber to check for existing backups |
| 46 | +- Generates: |
| 47 | + - ExternalSecret (per-PVC S3 credentials) |
| 48 | + - ReplicationSource (backup schedule) |
| 49 | + - ReplicationDestination (restore capability) |
| 50 | +- If backup exists: mutates PVC with `dataSourceRef` for auto-restore |
| 51 | + |
| 52 | +### 4. VolSync |
| 53 | +- Performs actual backup/restore operations using Restic |
| 54 | +- Uses Longhorn snapshots for consistent backups |
| 55 | +- Stores data in S3 with Restic encryption |
| 56 | + |
| 57 | +## Sync Wave Order |
| 58 | + |
| 59 | +| Wave | Component | Purpose | |
| 60 | +|------|-----------|---------| |
| 61 | +| 0 | 1Password Connect, External Secrets | Secret management foundation | |
| 62 | +| 1 | Longhorn, VolSync, Snapshot Controller | Storage foundation | |
| 63 | +| 2 | pvc-plumber | Backup existence checker | |
| 64 | +| 4 | Kyverno | Policy engine (calls pvc-plumber) | |
| 65 | +| 6 | My Apps | Application workloads with PVCs | |
| 66 | + |
| 67 | +## How to Enable Backup for a PVC |
| 68 | + |
| 69 | +Add a backup label to your PVC: |
| 70 | + |
| 71 | +```yaml |
| 72 | +apiVersion: v1 |
| 73 | +kind: PersistentVolumeClaim |
| 74 | +metadata: |
| 75 | + name: my-data |
| 76 | + namespace: my-app |
| 77 | + labels: |
| 78 | + backup: "hourly" # Backups every hour |
| 79 | + # OR |
| 80 | + backup: "daily" # Backups at 2am daily |
| 81 | +spec: |
| 82 | + accessModes: |
| 83 | + - ReadWriteOnce |
| 84 | + storageClassName: longhorn |
| 85 | + resources: |
| 86 | + requests: |
| 87 | + storage: 10Gi |
| 88 | +``` |
| 89 | +
|
| 90 | +## Backup Schedules |
| 91 | +
|
| 92 | +| Label | Schedule | Retention | |
| 93 | +|-------|----------|-----------| |
| 94 | +| `backup: hourly` | Every hour (0 * * * *) | 24 hourly, 7 daily, 4 weekly, 2 monthly | |
| 95 | +| `backup: daily` | 2am daily (0 2 * * *) | 24 hourly, 7 daily, 4 weekly, 2 monthly | |
| 96 | + |
| 97 | +## Scenario Behavior |
| 98 | + |
| 99 | +### Fresh Cluster (No Backups) |
| 100 | +1. PVC created with backup label |
| 101 | +2. Kyverno calls pvc-plumber → no backup found |
| 102 | +3. PVC created normally (empty) |
| 103 | +4. Backup schedule begins |
| 104 | + |
| 105 | +### Disaster Recovery (Backups Exist) |
| 106 | +1. PVC created with backup label |
| 107 | +2. Kyverno calls pvc-plumber → backup found |
| 108 | +3. Kyverno adds `dataSourceRef` to PVC |
| 109 | +4. VolSync VolumePopulator restores data |
| 110 | +5. PVC bound with restored data |
| 111 | + |
| 112 | +### App Re-deployment |
| 113 | +Same as disaster recovery - existing backups are automatically restored. |
| 114 | + |
| 115 | +## 1Password Configuration |
| 116 | + |
| 117 | +The `rustfs` item in 1Password must contain: |
| 118 | + |
| 119 | +| Field | Example Value | Purpose | |
| 120 | +|-------|--------------|---------| |
| 121 | +| `k8s-admin-access-key` | `k8s-admin` | S3 access key ID | |
| 122 | +| `k8s-admin-secret-key` | (secret) | S3 secret access key | |
| 123 | +| `restic_password` | (password) | Restic encryption key | |
| 124 | +| `restic_repository` | `s3:http://192.168.10.133:30292/volsync-backup/` | Base S3 path | |
| 125 | +| `endpoint` | `http://192.168.10.133:30292` | S3 endpoint (for pvc-plumber) | |
| 126 | +| `bucket` | `volsync-backup` | S3 bucket (for pvc-plumber) | |
| 127 | + |
| 128 | +## S3 Bucket Structure |
| 129 | + |
| 130 | +``` |
| 131 | +volsync-backup/ |
| 132 | +├── {namespace}/ |
| 133 | +│ └── {pvc-name}/ |
| 134 | +│ ├── config # Restic repository config |
| 135 | +│ ├── data/ # Deduplicated backup data |
| 136 | +│ ├── index/ # Restic index files |
| 137 | +│ ├── keys/ # Encryption keys |
| 138 | +│ ├── locks/ # Lock files |
| 139 | +│ └── snapshots/ # Snapshot metadata |
| 140 | +``` |
| 141 | +
|
| 142 | +## Troubleshooting |
| 143 | +
|
| 144 | +### PVC Stuck in Pending |
| 145 | +1. Check if ReplicationDestination exists: `kubectl get replicationdestination -n <namespace>` |
| 146 | +2. Check pvc-plumber logs: `kubectl logs -n volsync-system -l app.kubernetes.io/name=pvc-plumber` |
| 147 | +3. Check VolSync mover pod: `kubectl get pods -n <namespace> | grep volsync` |
| 148 | +
|
| 149 | +### Backup Not Running |
| 150 | +1. Check ReplicationSource: `kubectl get replicationsource -n <namespace>` |
| 151 | +2. Check secret exists: `kubectl get secret -n <namespace> | grep volsync` |
| 152 | +3. Check ExternalSecret status: `kubectl get externalsecret -n <namespace>` |
| 153 | +
|
| 154 | +### Test pvc-plumber |
| 155 | +```bash |
| 156 | +kubectl port-forward -n volsync-system svc/pvc-plumber 8080:80 |
| 157 | +curl http://localhost:8080/exists/karakeep/data-pvc |
| 158 | +# Expected: {"exists":true} or {"exists":false} |
| 159 | +``` |
| 160 | + |
| 161 | +## Excluded Namespaces |
| 162 | + |
| 163 | +The following namespaces are excluded from automatic backup: |
| 164 | +- `kube-system` |
| 165 | +- `volsync-system` |
| 166 | +- `kyverno` |
| 167 | + |
| 168 | +## Files |
| 169 | + |
| 170 | +| File | Purpose | |
| 171 | +|------|---------| |
| 172 | +| `infrastructure/controllers/pvc-plumber/` | Backup existence checker service | |
| 173 | +| `infrastructure/controllers/kyverno/policies/volsync-pvc-backup-restore.yaml` | Kyverno policy | |
| 174 | +| `infrastructure/storage/volsync/` | VolSync Helm chart + VolumeSnapshotClass | |
| 175 | +| `infrastructure/controllers/argocd/apps/pvc-plumber-app.yaml` | ArgoCD Application | |
0 commit comments