Create a ZFS dataset for each volume when running on ZFS #775
Description
Tell us about your request
Create a ZFS dataset for each volume. This makes volumes snapshots possible with the added benefit of being able to customize ZFS options per volume (like compression).
Which service(s) is this request for?
Docker Engine
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
This allows you to take volume snapshots that are easy and instantaneous to restore when something goes terribly wrong, like during a risky update / migration.
It's also a great way to achieve better backups. You can either snapshot a live container storage (Would probably be okay but you may still have issues during a restore) or stop a container, take a snapshot (instantaneous), start it again and backup the snapshot. This achieve perfect backups with the minimum amount of downtime. I'm already doing the live snapshot to backup a backup server and it work great.
Are you currently working around the issue?
I'm not the first person to think of that and I found a volume plugin that I improved to be easily usable in docker compose (https://github.com/icefo/docker-zfs-plugin), but since ZFS is a kernel level driver you have to resort to a smelly workaround to make it work in the V2 volume plugin architecture. It's more a proof concept, it logs event to an hard coded path for debug, but it works.
The smelly workaround
In short volume plugins are containers and have to mount the volumes in a specific folder in the container that is shared with the host. Sounds great in theory but ZFS is a kernel level driver, so the mountpoints will be relative to the host and not the container (this break the encapsulation). The workaround he found is:
- Define this path as the shared folder:
/var/lib/docker/plugins/pluginHash/propagated-mount/
- Add this
../../../../../..
to all paths returned to docker from the plugin to return to the true host root path
This allows the plugin to mount the ZFS datasets wherever it wants in the system, I defined a folder for that in /mnt.
This is very brittle and a potential security vulnerability that docker may fix in the future. I don't think it's bad since volume plugins seem to have CAP_SYS_ADMIN anyway, but sorry for not reporting it through proper channels. I posted this on the docker forums & reddit a few days ago, so it's public now anyway.
Additional context
If this feature interest the docker maintainers, I would be interested to write it. I would need some guidance on the proper approach though. Do I improve the local driver to make use of datasets if the underlying filesystem supports it (ZFS, btrfs, bcachefs, ...?) or do I create a new driver.
I quickly looked at the local driver and it seems I would only have to modify the create & remove functions. I need to check what the live restore function actually does too.