-
Notifications
You must be signed in to change notification settings - Fork 43
Description
Hi @elemental-lf, to cut to the chase, I'm looking for help while I write some tooling. I'm hoping to put together something robust and useful enough that other folks can take advantage as well, and I want to make sure I'm on the right track before I invest too much time.
Background and Motivation
I have an existing (homelab) Kubernetes cluster that is ultimately backed by Ceph, managed by Rook. The storage is attached to the same execution nodes. Right now, I use Velero to schedule and manage my backups - this is a tool largely aimed at making Kubernetes resources restorable to the same or other clusters, and is not especially interested in backups, per se -- it can be made to create VolumeSnapshots (and VolumeSnapshotContents) using CSI, but doesn't muck with exporting those snapshots.
I think that approach makes a lot of sense for managed Kubernetes instances, or instances backed by a robust storage medium with its own replication capabilities. Indeed, it would be great if I had a couple dozen hosts to handle replication at the Ceph level.
As far as off-medium (and off-site) backups, it's a lot more cost effective to run a standalone Minio pod backed by an external drive, and call that "archival object storage." (Especially with something like rclone to Backblaze B2 for off-site replication.)
Benji strikes me as well-organized and well-regarded, and as operating at the perfect layer to fill in the gap for me:
- very intelligent RBD backups
- S3 as an archival layer
Where the Existing Tooling Clashes
As far as I'm able to tell, a typical scheduled Velero backup grabs copies of most Kubernetes resources (as returned by the API), and serializes them to S3. PVCs are special: Velero optionally injects a command to the attached pod (such as fsfreeze), asks CSI to create a VolumeSnapshot, waits for it to complete (and for the VolumeSnapshotContent to be available), then injects another pod command, and then serializes the VolumeSnapshot and VolumeSnapshotContent objects to archive.
Of course, the VolumeSnapshotContent contains a reference to the saved data on the underlying storage layer (in this case, RBD), but that data is not replicated to archive.
Benji is very well positioned to take that backup model and shore up the final piece: replicating the underlying RBD snapshot to archival storage.
Where the clash with existing tooling is, is that the existing scripts seem tuned for Benji as the primary backup provider, with it responsible for freezing the filesystems, taking the snapshots, and for managing their lifecycles.
Proposition (i.e. please help me do this)
I think that there's no fundamental disagreement here, just a need for a for-purpose script. Ideally, it'd be one general and robust enough to be included in Benji's standard distribution, with associate documentation to help out anyone who may be following down the same path that I am.
So: I want to write a script along the lines of the existing backup_pvc.py, which crawls through existing VolumeSnapshot objects, and ensures that any corresponding VolumeSnapshotContent (or, at least the ones on RBD) are replicated to archival storage.
I think this should be as simple as:
- list every
VolumeSnapshot - for each
VolumeSnapshot, find every correspondingVolumeSnapshotContent(the current one appears to be linked in thestatusfield, but former ones may continue to exist, I think.) - for each
VolumeSnapshotContent, look up the RBD volume and snapshot it corresponds to; if the volume has previously been backed up, perform a differential backup against the previous snapshot; otherwise, do a full backup.
Where I expect to find issues:
-
Deciding when old versions can be cleaned.
With Velero (or some other backup tool with a similar approach) as the primary tool, users should be able to expect that (1) any backup in Velero which is not yet expired has not yet been cleaned from archival storage, and that (2) any backup version which has expired in Velero can be expected to at least, eventually, expire.
Velero's idea of when something should expire is not exposed in the
VolumeSnapshot. Even if it were, I'd have reservations about trusting it, because backup versions can be "frozen" in Velero, to prevent deletion, and we'd need to have that context propagate out to theVolumeSnapshot, too.Given that, I think the approach is to not set an expiration time for Benji. That handles the (1) case. In order to clean things up, though, we would need an additional hook to discover which snapshots have already been expired by Velero on the live storage (by seeing which RBD snapshots have been deleted), and then immediately mark the corresponding Benji versions as expired. I don't know whether this is possible.
-
Ensuring that this tool runs frequently enough to prevent failing to archive versions.
I don't think there's anything to be done here but to guide users to schedule the
benji-backup-existing-snapshotsscript to run at least as frequently as Velero creates backups. I'd appreciate feedback on whether you think that's the right approach, too, though. -
Making this script general enough to be of any use.
As it stands, I have only my home environment to test in. (Necessity is, of course, the mother of invention.) I would guess that my use-case (Rook Ceph backed by
hostPathOSDs, connected to Velero using CSI) is not unique, but there's undoubtedly going to be variation that I wouldn't expect.I think I can mitigate some of that by piggy-backing on the existing scripting tools and helpers, but I'd like your perspective on whether this is something other folks could conceive of reusing, and what factors to be aware of in writing this script for broader consumption.
Thanks for reading!
Sasha