Skip to content

storage/pkg/chroot: fix IsRootless() init#908

Open
Luap99 wants to merge 3 commits into
podman-container-tools:mainfrom
Luap99:storage-chroot
Open

storage/pkg/chroot: fix IsRootless() init#908
Luap99 wants to merge 3 commits into
podman-container-tools:mainfrom
Luap99:storage-chroot

Conversation

@Luap99

@Luap99 Luap99 commented Jun 10, 2026

Copy link
Copy Markdown
Member

Buildah 1.44 has a new regression that was caused by me removing a bunch
of init() code out of buildah. So that the binary no longer has to do so
much extra work on each reexec.

That has however exposed a storage bug in our chrootarchive logic. The
unshare.IsRootless() calls needs access to /proc and that is not there
once we call chroot(). So the only way for IsRootless() to work is if we
execute before. Then all callers later will get the cached result which
no longer needs any /proc access.

The exact error here is because while InUserNS is part of the TarOptions
struct and passed down from the top level we end up calling
GetOverlayXattrName("opaque") to get the xattr name and that does not
have access to TarOptions and just calls IsRootless() again.
And because that failed it assumed we run as root and used the wrong
xattr which then during writing caused a EPERM and thus failed the tar
extraction.

Now we could try to fix the code path and pass the variable down the
stack but that seems a lot work work so I opted to just fix this by
calling IsRootless() before we chroot to cache the right value.

Small buildah reproducer:

tmp=$(mktemp -d)
unshare --map-root-user --map-auto --mount -- bin/buildah \
--root $tmp/root --runroot $tmp/runroot pull \
quay.io/devfile/python:slim@sha256:54924a2ee4a2ef17028ae076ce38e59b3f4054353a5c9f9318dfaee60377532c

I guess any image that uses an opaque directory should work.

Fixes: podman-container-tools/buildah#6903

Getting an operation not permitted without any context is not that
helpful, add at least the syscall and path to the error.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@github-actions github-actions Bot added the storage Related to "storage" package label Jun 10, 2026
@Luap99

Luap99 commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

cc @nalind @giuseppe @mtrmac

@giuseppe giuseppe left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

another idea is to use proc, _ := unix.Fsopen("erofs", unix.FSOPEN_CLOEXEC) to get a fd to procfs and then use unix.Openat(proc, "...") to read the file under proc without exposing it, but I am not sure how it behaves inside a chroot

@Luap99

Luap99 commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

proc, _ := unix.Fsopen("erofs", unix.FSOPEN_CLOEXEC)

Did you mean unix.Fsopen("proc", unix.FSOPEN_CLOEXEC)? Otherwise I do not follow how erofs would get us a usable procfs.

I am not familiar with the fsopen API so I am not sure if it has any problems? Possibly a bit more expensive to create a new fs context in the kernel than just doing the lookup on the existing mounted /proc? Since we cache the result I guess it does not really matter even if it would be.

@giuseppe

Copy link
Copy Markdown
Contributor

Yes sorry, I have copied that line from the existing code but didn't replace the file system name

@TomSweeneyRedHat

Copy link
Copy Markdown
Contributor

LGTM
@nalind and @mtrmac PTAL

@giuseppe

Copy link
Copy Markdown
Contributor

I am not familiar with the fsopen API so I am not sure if it has any problems? Possibly a bit more expensive to create a new fs context in the kernel than just doing the lookup on the existing mounted /proc? Since we cache the result I guess it does not really matter even if it would be.

my suggestion was to use that as a fallback when /proc is not available so we don't require the "lookup before chroot" check, but we need first to check if it works from a chroot environment (I'd expect so, but there might be checks in the kernel that we are not aware of). In any case, it can be a follow-up, the current version solves the reported problem


// NOTE: this must happen before the chroot as IsRootless needs access to /proc and
// because IsRootless caches the result in memory this will make callers later work.
_ = unshare.IsRootless()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar in some ways to containers/storage#1228, and moving this function call closer to the changes that were made there might make it easier for someone in the future.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, however tar() does not call chroot() it calls realChroot() directly and thus is not reached there? Was that intentional?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's from containers/storage#357, and as both are entry points for re-exec subprocesses, it's not clear to me why one uses chroot() and the other uses realChroot().

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok then my proposal would be to switch it over to chroot() so we use the some logic for all of them

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

realChroot seems to come from 8da818d , with

This fixes one issue with Docker running under a grsec kernel, which denies chmod and mknod under chroot.

and that specific reason does not apply to archive creation.


I agree that consolidating the pre-chroot activities into a single place is valuable. As for whether it is worth using the rather more complex pivot_root approach vs. just a dumb chroot, I don’t feel strongly.

Either way the tests are failing in a pretty scary way.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so now I am worried, it took me way to long to understand but...

chroot() the main syscall applies the the whole process.

the pivot_root() call applies to the full mount namespace, except that the mount namespace is a per thread attribute and so we need to to use LockOSThread() which we do already.

The real problem happens because spawning go routines inside a locked thread apparently means use whatever and not inherit the parent thread, and whatever happens to be a thread with the normal mountns does escaping to the real rootfs, scary indeed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow what a maintenance nightmare. That needs loud documentation across the call stack…

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes one issue with Docker running under a grsec kernel, which
denies chmod and mknod under chroot.

Do we actually care about that? IMO the sensible thing is to drop the pivot_root implementation

Alternatively we can use the unshare flags on the cmd struct in the parent to request a new mount namespace there and that should apply to the full process than if we do not create our own namespace in the child

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grsecurity still seems to exist and be maintained, so dropping that entirely is not ideal (but it might well be acceptable, I have no idea. @mheon ?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well grsecurity is not upstream linux, we target upstream linux IMO. I have little interested in maintaining kernel compatibility for third party patch sets.

Luap99 added 2 commits June 10, 2026 17:53
Buildah 1.44 has a new regression that was caused by me removing a bunch
of init() code out of buildah. So that the binary no longer has to do so
much extra work on each reexec.

That has however exposed a storage bug in our chrootarchive logic. The
unshare.IsRootless() calls needs access to /proc and that is not there
once we call chroot(). So the only way for IsRootless() to work is if we
execute before. Then all callers later will get the cached result which
no longer needs any /proc access.

The exact error here is because while InUserNS is part of the TarOptions
struct and passed down from the top level we end up calling
GetOverlayXattrName("opaque") to get the xattr name and that does not
have access to TarOptions and just calls IsRootless() again.
And because that failed it assumed we run as root and used the wrong
xattr which then during writing caused a EPERM and thus failed the tar
extraction.

Now we could try to fix the code path and pass the variable down the
stack but that seems a lot work work so I opted to just fix this by
calling IsRootless() before we chroot to cache the right value.

Small buildah reproducer:
tmp=$(mktemp -d)
unshare --map-root-user --map-auto --mount -- bin/buildah \
--root $tmp/root --runroot $tmp/runroot pull \
quay.io/devfile/python:slim@sha256:54924a2ee4a2ef17028ae076ce38e59b3f4054353a5c9f9318dfaee60377532c

I guess any image that uses an opaque directory should work.

Fixes: podman-container-tools/buildah#6903

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
It does more setup we should have for all chroot calls.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

storage Related to "storage" package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Buildah 1.44.0 fails to pull images with opaque whiteouts when running in a privileged user namespace

5 participants