Skip to content

Commit 02f112c

Browse files
committed
updates
1 parent 8483247 commit 02f112c

4 files changed

Lines changed: 51 additions & 21 deletions

File tree

CLAUDE.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -311,7 +311,7 @@ spec:
311311

312312
**When to use backup labels**:
313313
- User-generated content (photos, documents, uploads)
314-
- Database volumes (Postgres, Redis, etc.)
314+
- Non-CNPG database volumes (Redis, SQLite, etc.)
315315
- Configuration that's hard to recreate
316316
- AI model caches (large downloads)
317317

@@ -324,7 +324,7 @@ spec:
324324

325325
### Application with Database (CNPG CloudNativePG)
326326

327-
Databases use **CloudNativePG** with Barman backups to RustFS S3 — a separate backup path from the PVC/VolSync system.
327+
Databases use **CloudNativePG** with Barman backups to RustFS S3 — a **separate backup path** from the PVC/VolSync system. PVC backups use NFS + Kopia (shared repository with cross-PVC deduplication). Database backups use S3 + Barman (SQL-aware `pg_basebackup` + WAL archiving for point-in-time recovery). Each tool uses its native backup mechanism — see [backup-restore.md](docs/backup-restore.md#why-two-backup-systems-nfs-for-pvcs-s3-for-databases) for the full rationale.
328328

329329
```yaml
330330
# infrastructure/database/cloudnative-pg/<app>/cluster.yaml

docs/backup-restore.md

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -182,14 +182,42 @@ Kyverno generates a secret per-PVC with:
182182
/mnt/BigTank/k8s/volsync-kopia-nfs/
183183
├── kopia.repository # Kopia repository config
184184
├── kopia.blobcfg # Blob storage config
185-
├── p/ # Pack files (deduplicated data)
185+
├── p/ # Pack files (ALL deduplicated data from ALL PVCs)
186186
├── q/ # Index blobs
187-
├── n/ # Manifest blobs
187+
├── n/ # Manifest blobs (snapshots tagged by namespace/pvc-name)
188188
└── x/ # Session blobs
189189
```
190190
191191
All PVC backups share the same Kopia repository, with snapshots tagged by namespace/pvc-name.
192192
193+
### Cross-PVC Deduplication
194+
195+
This shared repository design is a deliberate choice. Kopia uses **content-defined chunking** — files are split into variable-size chunks based on content boundaries, and each chunk is stored by its hash. If the same chunk exists anywhere in the repository (from any PVC, any namespace), it's stored only once.
196+
197+
**What this means in practice:**
198+
- Delete and recreate an app → new PVC backs up → Kopia finds all chunks already exist → near-instant backup, almost zero new storage
199+
- Multiple apps with similar files (configs, timezone data, shared libraries) → one copy
200+
- Incremental backups only store changed chunks, not changed files
201+
- Storage grows by unique data, not by number of PVCs
202+
203+
**Why not S3 + Restic?** VolSync also supports Restic to S3, but each PVC gets its own separate Restic repository — zero cross-PVC deduplication. Delete and recreate an app = full backup from scratch. More storage, more bandwidth, slower.
204+
205+
## Why Two Backup Systems (NFS for PVCs, S3 for Databases)
206+
207+
**PVC backups → NFS + Kopia** because:
208+
- VolSync's Kopia mover needs filesystem access for content-defined chunking and dedup
209+
- Direct NFS gives 10Gbps to TrueNAS with no HTTP overhead
210+
- No per-namespace S3 credentials — Kyverno just injects the NFS mount
211+
- One shared repository = cross-PVC deduplication (see above)
212+
213+
**Database backups → S3 + Barman** because:
214+
- CNPG's built-in backup only supports Barman, and Barman speaks S3 (not NFS)
215+
- Barman does SQL-aware backups (`pg_basebackup` + continuous WAL archiving) for point-in-time recovery
216+
- Filesystem-level snapshots of running Postgres can be inconsistent without the WAL stream
217+
- CNPG has no native NFS backup option
218+
219+
Each tool uses its native backup mechanism. Forcing either into the other's model would mean worse backups.
220+
193221
## Manual Restore
194222
195223
To manually trigger a restore:

omni/cluster-template/cluster-template.yaml

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ name: talos-prod-cluster
55
labels:
66
cluster-id: "1"
77
kubernetes:
8-
version: v1.34.1
8+
version: v1.35.0
99
talos:
10-
version: v1.11.6
10+
version: v1.12.4
1111
patches:
1212
- name: disable-default-cni
1313
inline:
@@ -19,10 +19,10 @@ patches:
1919
disabled: true
2020
- name: dns-resolver
2121
inline:
22-
machine:
23-
network:
24-
nameservers:
25-
- 192.168.10.1
22+
apiVersion: v1alpha1
23+
kind: ResolverConfig
24+
nameservers:
25+
- address: 192.168.10.1
2626
- name: node-performance
2727
inline:
2828
machine:
@@ -41,6 +41,11 @@ kind: ControlPlane
4141
machineClass:
4242
name: proxmox-control-plane
4343
size: 3
44+
systemExtensions:
45+
- siderolabs/iscsi-tools
46+
- siderolabs/nfs-utils
47+
- siderolabs/qemu-guest-agent
48+
- siderolabs/util-linux-tools
4449
patches:
4550
- name: control-plane-performance
4651
inline:
@@ -64,11 +69,6 @@ patches:
6469
scheduler:
6570
extraArgs:
6671
kube-api-qps: "100"
67-
systemExtensions:
68-
- siderolabs/iscsi-tools
69-
- siderolabs/nfsd
70-
- siderolabs/qemu-guest-agent
71-
- siderolabs/util-linux-tools
7272
---
7373
kind: Workers
7474
name: workers
@@ -77,7 +77,7 @@ machineClass:
7777
size: 3
7878
systemExtensions:
7979
- siderolabs/iscsi-tools
80-
- siderolabs/nfsd
80+
- siderolabs/nfs-utils
8181
- siderolabs/qemu-guest-agent
8282
- siderolabs/util-linux-tools
8383
patches:
@@ -106,7 +106,7 @@ machineClass:
106106
size: 1
107107
systemExtensions:
108108
- siderolabs/iscsi-tools
109-
- siderolabs/nfsd
109+
- siderolabs/nfs-utils
110110
- siderolabs/qemu-guest-agent
111111
- siderolabs/util-linux-tools
112112
- siderolabs/nonfree-kmod-nvidia-production
@@ -118,6 +118,11 @@ patches:
118118
nodeLabels:
119119
gpu-worker: "true"
120120
nvidia.com/gpu: "true"
121+
- name: gpu-network-dhcp
122+
inline:
123+
apiVersion: v1alpha1
124+
kind: DHCPv4Config
125+
name: ens18
121126
- file: patches/gpu-worker.yaml
122127
- name: longhorn-storage
123128
inline:

omni/cluster-template/patches/gpu-worker.yaml

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,10 @@
11
# GPU Worker Talos Configuration
22
# Network topology:
33
# ens18 (net0/vmbr0) - Main LAN 192.168.10.x (DHCP) - 10G via switch to TrueNAS
4+
# DHCP config moved to inline DHCPv4Config in cluster-template.yaml
45
#
56
# Storage traffic now goes through 10G switch (192.168.10.133)
67
machine:
7-
network:
8-
interfaces:
9-
- interface: ens18
10-
dhcp: true
118
kubelet:
129
nodeIP:
1310
validSubnets:

0 commit comments

Comments
 (0)