-
Notifications
You must be signed in to change notification settings - Fork 10
many: add support for unpriviledged building (HMS-5955) #189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Small question before I take a look, but how does this interact with the cross-build support via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super cool capability wise.
I do have my concerns that we are growing towards pretty diverging paths in how we execute osbuild
and that we're (at least partially) duplicating isolation mechanisms (e.g. we're running bubblewrap
inside a virtual machine inside a container.
We should probably plan on how we want to reconcile those eventually?
err := util.RunCmdSync( | ||
"supermin", | ||
"--build", prepareDir, | ||
// XXX: what is the right size? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it's thinly provisioned (right?) it doesn't matter too much to have it be large.
However, the real fix is to use virtiofs for the rootfs I think, which would obviate much of the role of supermin.
We will need qemu-user-static inside the supermin machine and then it should work there, I will look into adding a proper integration test for this too. |
5347934
to
fd78dff
Compare
I'm not reviewing the code, but I have comments. Knowing just what it says in the intro I built this and threw it into my F42 VM and ran into a few things:
Another observation is that this seems to be making this project more complex :) I'm not sure that making the tool setup the environment to run the tool in really belongs in the same codebase. Adding dependencies on podman, supermin, kvm setup all seem to be things that are separate from the core job of the project. |
Thanks for giving it a whirl and for the excellent feedback!
Yeah, I think @thozza also noticed that ibcli is too liberal about the osbuild version it needs, I will look into this.
It is designed to support both containers and running on a regular system (or a VM with nested KVM). If it runs on a regular system it will use supermin when run with "uid != 0". But it sounds like the README (and commit message) needs ot make these points much cleaner.
Thanks! My bad, I think this happens when "virtiofsd" is not installed, I will fix the detection of this and make the error cleaner.
That is definitely a valid concern! Its a trade-off (its not that much code) but definitely another layer (pun intended) of complexity. Especially when it runs inside the container it adds two levels of nesting and all the dependencies supermin needs (including a kernel!). We are exploring this because people are hesitant to run privileged containers and the barrier for /dev/kvm seems to be lower. |
Hm, so I was running it as a normal user in the VM and I had to add the experimental flag. Without it I get: If I install With all that it did run successfully! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments added. Also, it's spelled 'unprivileged', note the lack of a d
😉
I'm growing a bit concerned (much like @bcl) about the amount of ways we're growing to wrap osbuild
.
Can we (perhaps after this PR) take a step back and see how we actually want to do this? I envision that the different 'strategies' of running osbuild
belong as wrapper(s/-scripts) installed on the system. Each with their particular dependencies that they need.
We can ship them as subpackages, if we wish and image-builder
(or whichever part of our stack we use to execute osbuild
) can query for available wrappers and then pick based on some heuristic.
We can then install osbuild-via-supermin
in our container (which depends on all the supermin
stuff), or osbuild-via-qemu-user-static
, etc.
var osGetuid = os.Getuid | ||
|
||
func imageBuilderDefaultCacheDir() (string, error) { | ||
if osGetuid() == 0 { | ||
return "/var/cache/image-builder/store", nil | ||
} | ||
xdgCacheHome := os.Getenv("XDG_CACHE_HOME") | ||
if xdgCacheHome == "" { | ||
homeDir, err := os.UserHomeDir() | ||
if err != nil { | ||
return "", err | ||
} | ||
xdgCacheHome = filepath.Join(homeDir, ".cache") | ||
} | ||
return filepath.Join(xdgCacheHome, "image-builder/store"), nil | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split this into a separate PR? I like it but I think we'll have some more discussion in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - I'm curious about your thoughts here, what extra discussion do you have in min? This change in itself is only useful if in combination with supermin, otherwise there will never be a user cache with actual files (as only manifest can run as non-root without supermin)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only disk image types require privileges, there are a few that don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is interesting. We do require bind mount in osbuild (and maybe more for bwrap) so a user container with --privileged might be enough for non-disk types (like contianer). I have not tested this though.
# rsync/cp the caches so that our build is faster | ||
echo "Populate /store from host" | ||
mkdir /store | ||
rsync -a --exclude=./tmp /host/store/ /store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will easily make someone run out of disk space right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, one way to avoid this would be something like this:
- create empty store inside VM
- use virtiofsd to only map in store/sources
This way we won't be able to share anything but the cached files/containers but that might be good enough and avoids the data duplication.
osbuild \ | ||
--export %s \ | ||
--output-directory /output \ | ||
--cache /store \ | ||
/output/manifest.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely happy since we've been doing a bit of effort to centralize how we format calling osbuild
. But this adds a way of calling it that exists entirely outside of that approach. Don't really know a better way either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we need a new "osbuild/exec" or something module that is just passed to the various places to have only a single implementation of executing osbuild. Its on my list for a while, maybe now is a good time to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't need to be in this PR, as long as we track (somewhere) that we'll unify it :)
c492bd4
to
b68b33d
Compare
This commit detects if a container is rootful and privileged and only sets up the selinux/osbuild workarounds when that is the case. With that change we can run the container as an unprivileged user and use "supermin" inside to do the building while still supporting the existing workflow (i.e. direct building inside the container if we are privileged and rootful).
This is preparing to extract the run part of the progress package into its own code. Currently there is an unfortunate mix of concerns here, i.e. `progress` should be generic and does not need to know details about osbuild and it really should be its own package. Until that is done this split already helps (a bit).
This commit detects if there are enough priviledges to run osbuild and if not errors. This prepares for the `supermin` build mode which will be used in the followup commits when not enough priviledges are detected. It currently checks if it can do a mknod for the /dev/sda1 block devices. Do a functional check here as this is the most reliable way to see if we have enough privileges. Checking for CAP_SYS_ADMIN is not good enough as that is available in a privileged user container but has not enough privs to create device nodes or mount filesystems. Being able to check for mknod is sufficient, alternatively we could try to mount ext4 but that is much more involved as we would need to do the loop device dance and call mkfs.* which may not be available (or we would just try syscall.Mount("/any/block-device", "/valid/mnt", 0, "") and return false if we get EPERM there. Thanks for Ondrej for pointing this out!
This adds support to build images without root by using `supermin` and a virtual machine. With `supermin` we can create a minimal VM with osbuild and all supported pacakge very quickly and then map /store and /output in. This works on a regular system with a uid != 0 and /dev/kvm access and inside a container that has access to /dev/kvm. There are a few wrinkles: - /dev/kvm must be available for the VM or things are really slow - /store cannot be mounted from the host and mounted directly because there will be selinux label/uid map issues between VM and host - progress reporting is currently not supported (but adding it should be conceptually not very hard, see todo in the code).
This commit moves the handling of a user owned cachedir into the right places, i.e. when run as user use check XDG_CACHE_HOME: ~/.cache/image-builder/store vs /var/cache/image-builder/store when run as root.
This commit adds support to skip the priviledges checks when preparing for running osbuild. This is needed so that we can still run our (fake) integration tests.
Add code to wait for virtiofsd sockets to appear.
This commit adds a stamp file if the build is successful. This is needed because it is non-trivial to exit qemu non-zero [0] if osbuild fails, so instead we just generate a build-done "stamp" file in output and if that is missing we know something went wrong and show the log. It has to be a random stamp to avoid having to deal with pre-existing build-stamp files. [0] It is non-trivial because any failure inside the VM will make the kernel panic as pid1 dies. In this case we need to stop the VM and shutdown so that the VM does not hang forever. But shutdown via non-zero exit code needs very osbsure ISA debug ports that we could poke from inside the VM. Given that we are only interested in two cases: 1. build successful 2. build failed and if the build failed we need to report a log file just using the stamp seems the most expedient option.
Some tests require running as non-root to test new modes like supermin. So enable that as part of the integration tests.
Ensure that we did a disk sync before we write the "success" stamp file and print a warning if the shutdown takes a long time. Magic-sysreq is very robust so this should never happen but lets be robust. Thanks to Simon for the suggestion.
For the record, a patch was merged into |
This PR adds support for building images without root or a privileged container using the "supermin" binary. It will require access to /dev/kvm though to make the builds reasonably fast.
This is ready for review now, it will need followups to support supermin building in aarch64, s390x and ppc64le, Colin/Johnathan and frieds in COSA did excellent work here so this is hopefully mostly busywork (testing it will be tricky though).
Another followup will be needed to add proper functional tests for supermin, I have a WIP branch in https://github.com/osbuild/image-builder-cli/compare/main...mvo5:supermin-cleanup-test?expand=1 for this but this PR is already quite large so I'm hesitant to make it even larger).
Kudos to cgwalters and jlebon for the excellent help and examples in osbuild/bootc-image-builder#98
/jira-epic COMPOSER-2487
JIRA: HMS-5955