Skip to content

many: add support for unpriviledged building (HMS-5955) #189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

mvo5
Copy link
Collaborator

@mvo5 mvo5 commented Apr 7, 2025

This PR adds support for building images without root or a privileged container using the "supermin" binary. It will require access to /dev/kvm though to make the builds reasonably fast.

This is ready for review now, it will need followups to support supermin building in aarch64, s390x and ppc64le, Colin/Johnathan and frieds in COSA did excellent work here so this is hopefully mostly busywork (testing it will be tricky though).

Another followup will be needed to add proper functional tests for supermin, I have a WIP branch in https://github.com/osbuild/image-builder-cli/compare/main...mvo5:supermin-cleanup-test?expand=1 for this but this PR is already quite large so I'm hesitant to make it even larger).

Kudos to cgwalters and jlebon for the excellent help and examples in osbuild/bootc-image-builder#98

/jira-epic COMPOSER-2487

JIRA: HMS-5955

@supakeen
Copy link
Member

supakeen commented Apr 7, 2025

Small question before I take a look, but how does this interact with the cross-build support via qemu-user-static?

Copy link
Member

@supakeen supakeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super cool capability wise.

I do have my concerns that we are growing towards pretty diverging paths in how we execute osbuild and that we're (at least partially) duplicating isolation mechanisms (e.g. we're running bubblewrap inside a virtual machine inside a container.

We should probably plan on how we want to reconcile those eventually?

err := util.RunCmdSync(
"supermin",
"--build", prepareDir,
// XXX: what is the right size?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's thinly provisioned (right?) it doesn't matter too much to have it be large.

However, the real fix is to use virtiofs for the rootfs I think, which would obviate much of the role of supermin.

@mvo5
Copy link
Collaborator Author

mvo5 commented Apr 7, 2025

Small question before I take a look, but how does this interact with the cross-build support via qemu-user-static?

We will need qemu-user-static inside the supermin machine and then it should work there, I will look into adding a proper integration test for this too.

@mvo5 mvo5 force-pushed the supermin-cleanup1 branch 2 times, most recently from 5347934 to fd78dff Compare April 8, 2025 16:35
@mvo5 mvo5 marked this pull request as ready for review April 8, 2025 16:36
@mvo5 mvo5 force-pushed the supermin-cleanup1 branch from fd78dff to ae7f485 Compare April 9, 2025 07:41
@schutzbot schutzbot changed the title many: add support for unpriviledged building many: add support for unpriviledged building (HMS-5955) Apr 10, 2025
@schutzbot schutzbot added the 🌟 best practice This pull request follows our best practices! label Apr 10, 2025
@bcl
Copy link
Contributor

bcl commented Apr 10, 2025

I'm not reviewing the code, but I have comments. Knowing just what it says in the intro I built this and threw it into my F42 VM and ran into a few things:

  • I had osbuild 143 installed and running it resulted in a message - DNF error occurred: InternalError: dnf-json output was empty, this was solved by upgrading to osbuild 146. So it looks like it needs a check for the minimal supported osbuild version
  • The title says 'unpriviledged building' but skimming the commits it looks like it really means 'inside a container' so I think this needs some documentation added to the README to make it clear how to run it.
  • I tried running it with IMAGE_BUILDER_EXPERIMENTAL=supermin in the F42 vm (with /dev/kvm) and it did a few things, then failed with `error: expected socket did not appear: files missing after 10s: [/var/tmp/supermin1452852129/vfsd_output.sock /var/tmp/supermin1452852129/vfsd_store.sock]

Another observation is that this seems to be making this project more complex :) I'm not sure that making the tool setup the environment to run the tool in really belongs in the same codebase. Adding dependencies on podman, supermin, kvm setup all seem to be things that are separate from the core job of the project.

@mvo5
Copy link
Collaborator Author

mvo5 commented Apr 10, 2025

I'm not reviewing the code, but I have comments. Knowing just what it says in the intro I built this and threw it into my F42 VM and ran into a few things:

Thanks for giving it a whirl and for the excellent feedback!

* I had osbuild 143 installed and running it resulted in a message - `DNF error occurred: InternalError: dnf-json output was empty`, this was solved by upgrading to osbuild 146. So it looks like it needs a check for the minimal supported osbuild version

Yeah, I think @thozza also noticed that ibcli is too liberal about the osbuild version it needs, I will look into this.

* The title says 'unpriviledged building' but skimming the commits it looks like it really means 'inside a container' so I think this needs some documentation added to the README to make it clear how to run it.

It is designed to support both containers and running on a regular system (or a VM with nested KVM). If it runs on a regular system it will use supermin when run with "uid != 0". But it sounds like the README (and commit message) needs ot make these points much cleaner.

* I tried running it with `IMAGE_BUILDER_EXPERIMENTAL=supermin` in the F42 vm (with /dev/kvm) and it did a few things, then failed with `error: expected socket did not appear: files missing after 10s: [/var/tmp/supermin1452852129/vfsd_output.sock /var/tmp/supermin1452852129/vfsd_store.sock]

Thanks! My bad, I think this happens when "virtiofsd" is not installed, I will fix the detection of this and make the error cleaner.
[edit: this should be fixed now and produce a clean error if virtiofsd is missing]

Another observation is that this seems to be making this project more complex :) I'm not sure that making the tool setup the environment to run the tool in really belongs in the same codebase. Adding dependencies on podman, supermin, kvm setup all seem to be things that are separate from the core job of the project.

That is definitely a valid concern! Its a trade-off (its not that much code) but definitely another layer (pun intended) of complexity. Especially when it runs inside the container it adds two levels of nesting and all the dependencies supermin needs (including a kernel!). We are exploring this because people are hesitant to run privileged containers and the barrier for /dev/kvm seems to be lower.

@bcl
Copy link
Contributor

bcl commented Apr 10, 2025

If it runs on a regular system it will use supermin when run with "uid != 0".

Hm, so I was running it as a normal user in the VM and I had to add the experimental flag. Without it I get:
error: not enough priviledges: must be root with CAP_SYS_ADMIN

If I install virtiofsd and qemu-kvm then supermin will run (still using the experimental flag). So it would make things easier to debug if it checked for required features/packages early on.

With all that it did run successfully!

Copy link
Member

@supakeen supakeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments added. Also, it's spelled 'unprivileged', note the lack of a d 😉


I'm growing a bit concerned (much like @bcl) about the amount of ways we're growing to wrap osbuild.

Can we (perhaps after this PR) take a step back and see how we actually want to do this? I envision that the different 'strategies' of running osbuild belong as wrapper(s/-scripts) installed on the system. Each with their particular dependencies that they need.

We can ship them as subpackages, if we wish and image-builder (or whichever part of our stack we use to execute osbuild) can query for available wrappers and then pick based on some heuristic.

We can then install osbuild-via-supermin in our container (which depends on all the supermin stuff), or osbuild-via-qemu-user-static, etc.

Comment on lines +372 to +426
var osGetuid = os.Getuid

func imageBuilderDefaultCacheDir() (string, error) {
if osGetuid() == 0 {
return "/var/cache/image-builder/store", nil
}
xdgCacheHome := os.Getenv("XDG_CACHE_HOME")
if xdgCacheHome == "" {
homeDir, err := os.UserHomeDir()
if err != nil {
return "", err
}
xdgCacheHome = filepath.Join(homeDir, ".cache")
}
return filepath.Join(xdgCacheHome, "image-builder/store"), nil
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split this into a separate PR? I like it but I think we'll have some more discussion in this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - I'm curious about your thoughts here, what extra discussion do you have in min? This change in itself is only useful if in combination with supermin, otherwise there will never be a user cache with actual files (as only manifest can run as non-root without supermin)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only disk image types require privileges, there are a few that don't.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is interesting. We do require bind mount in osbuild (and maybe more for bwrap) so a user container with --privileged might be enough for non-disk types (like contianer). I have not tested this though.

# rsync/cp the caches so that our build is faster
echo "Populate /store from host"
mkdir /store
rsync -a --exclude=./tmp /host/store/ /store
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will easily make someone run out of disk space right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, one way to avoid this would be something like this:

  • create empty store inside VM
  • use virtiofsd to only map in store/sources
    This way we won't be able to share anything but the cached files/containers but that might be good enough and avoids the data duplication.

Comment on lines +80 to +84
osbuild \
--export %s \
--output-directory /output \
--cache /store \
/output/manifest.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely happy since we've been doing a bit of effort to centralize how we format calling osbuild. But this adds a way of calling it that exists entirely outside of that approach. Don't really know a better way either.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we need a new "osbuild/exec" or something module that is just passed to the various places to have only a single implementation of executing osbuild. Its on my list for a while, maybe now is a good time to do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't need to be in this PR, as long as we track (somewhere) that we'll unify it :)

@mvo5 mvo5 force-pushed the supermin-cleanup1 branch 2 times, most recently from c492bd4 to b68b33d Compare April 11, 2025 11:20
mvo5 added 10 commits April 23, 2025 08:51
This commit detects if a container is rootful and privileged and
only sets up the selinux/osbuild workarounds when that is the case.
With that change we can run the container as an unprivileged user
and use "supermin" inside to do the building while still supporting
the existing workflow (i.e. direct building inside the container
if we are privileged and rootful).
This is preparing to extract the run part of the progress package
into its own code. Currently there is an unfortunate mix of
concerns here, i.e. `progress` should be generic and does not
need to know details about osbuild and it really should be
its own package. Until that is done this split already helps
(a bit).
This commit detects if there are enough priviledges to run osbuild
and if not errors. This prepares for the `supermin` build mode
which will be used in the followup commits when not enough
priviledges are detected.

It currently checks if it can do a mknod for the /dev/sda1 block
devices.

Do a functional check here as this is the most reliable way
to see if we have enough privileges. Checking for CAP_SYS_ADMIN
is not good enough as that is available in a privileged user
container but has not enough privs to create device nodes or
mount filesystems.

Being able to check for mknod is sufficient, alternatively
we could try to mount ext4 but that is much more involved
as we would need to do the loop device dance and call
mkfs.* which may not be available (or we would just try
syscall.Mount("/any/block-device", "/valid/mnt", 0, "")
and return false if we get EPERM  there.

Thanks for Ondrej for pointing this out!
This adds support to build images without root by using `supermin`
and a virtual machine. With `supermin` we can create a minimal
VM with osbuild and all supported pacakge very quickly and then
map /store and /output in.

This works on a regular system with a uid != 0 and /dev/kvm access
and inside a container that has access to /dev/kvm.

There are a few wrinkles:
- /dev/kvm must be available for the VM or things are really slow
- /store cannot be mounted from the host and mounted directly because
  there will be selinux label/uid map issues between VM and host
- progress reporting is currently not supported (but adding it
  should be conceptually not very hard, see todo in the code).
This commit moves the handling of a user owned cachedir into
the right places, i.e. when run as user use check XDG_CACHE_HOME:
~/.cache/image-builder/store
vs
/var/cache/image-builder/store
when run as root.
This commit adds support to skip the priviledges checks when
preparing for running osbuild. This is needed so that we can
still run our (fake) integration tests.
Add code to wait for virtiofsd sockets to appear.
This commit adds a stamp file if the build is successful. This is
needed because it is non-trivial to exit qemu non-zero [0] if osbuild
fails, so instead we just generate a build-done "stamp" file in
output and if that is missing we know something went wrong and show
the log. It has to be a random stamp to avoid having to deal
with pre-existing build-stamp files.

[0] It is non-trivial because any failure inside the VM will make
the kernel panic as pid1 dies. In this case we need to stop the
VM and shutdown so that the VM does not hang forever. But shutdown
via non-zero exit code needs very osbsure ISA debug ports that
we could poke from inside the VM. Given that we are only interested
in two cases:
1. build successful
2. build failed
and if the build failed we need to report a log file just using
the stamp seems the most expedient option.
Some tests require running as non-root to test new modes like
supermin. So enable that as part of the integration tests.
Ensure that we did a disk sync before we write the "success"
stamp file and print a warning if the shutdown takes a long
time. Magic-sysreq is very robust so this should never
happen but lets be robust.

Thanks to Simon for the suggestion.
@mvo5 mvo5 force-pushed the supermin-cleanup1 branch from b68b33d to d6cdfb6 Compare April 23, 2025 07:21
@lzap
Copy link
Contributor

lzap commented May 13, 2025

For the record, a patch was merged into main that drops logrus please remove all logrus statements from this PR and replace them with olog.Printf. Sorry for the inconvenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🌟 best practice This pull request follows our best practices!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants