storage/pkg/chroot: fix IsRootless() init#908
Conversation
Getting an operation not permitted without any context is not that helpful, add at least the syscall and path to the error. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
giuseppe
left a comment
There was a problem hiding this comment.
LGTM,
another idea is to use proc, _ := unix.Fsopen("erofs", unix.FSOPEN_CLOEXEC) to get a fd to procfs and then use unix.Openat(proc, "...") to read the file under proc without exposing it, but I am not sure how it behaves inside a chroot
Did you mean I am not familiar with the fsopen API so I am not sure if it has any problems? Possibly a bit more expensive to create a new fs context in the kernel than just doing the lookup on the existing mounted /proc? Since we cache the result I guess it does not really matter even if it would be. |
|
Yes sorry, I have copied that line from the existing code but didn't replace the file system name |
my suggestion was to use that as a fallback when |
|
|
||
| // NOTE: this must happen before the chroot as IsRootless needs access to /proc and | ||
| // because IsRootless caches the result in memory this will make callers later work. | ||
| _ = unshare.IsRootless() |
There was a problem hiding this comment.
This is similar in some ways to containers/storage#1228, and moving this function call closer to the changes that were made there might make it easier for someone in the future.
There was a problem hiding this comment.
That makes sense, however tar() does not call chroot() it calls realChroot() directly and thus is not reached there? Was that intentional?
There was a problem hiding this comment.
That's from containers/storage#357, and as both are entry points for re-exec subprocesses, it's not clear to me why one uses chroot() and the other uses realChroot().
There was a problem hiding this comment.
ok then my proposal would be to switch it over to chroot() so we use the some logic for all of them
There was a problem hiding this comment.
realChroot seems to come from 8da818d , with
This fixes one issue with Docker running under a grsec kernel, which denies
chmodandmknodunderchroot.
and that specific reason does not apply to archive creation.
I agree that consolidating the pre-chroot activities into a single place is valuable. As for whether it is worth using the rather more complex pivot_root approach vs. just a dumb chroot, I don’t feel strongly.
Either way the tests are failing in a pretty scary way.
There was a problem hiding this comment.
Ok so now I am worried, it took me way to long to understand but...
chroot() the main syscall applies the the whole process.
the pivot_root() call applies to the full mount namespace, except that the mount namespace is a per thread attribute and so we need to to use LockOSThread() which we do already.
The real problem happens because spawning go routines inside a locked thread apparently means use whatever and not inherit the parent thread, and whatever happens to be a thread with the normal mountns does escaping to the real rootfs, scary indeed
There was a problem hiding this comment.
Wow what a maintenance nightmare. That needs loud documentation across the call stack…
There was a problem hiding this comment.
This fixes one issue with Docker running under a grsec kernel, which
denies chmod and mknod under chroot.
Do we actually care about that? IMO the sensible thing is to drop the pivot_root implementation
Alternatively we can use the unshare flags on the cmd struct in the parent to request a new mount namespace there and that should apply to the full process than if we do not create our own namespace in the child
There was a problem hiding this comment.
grsecurity still seems to exist and be maintained, so dropping that entirely is not ideal (but it might well be acceptable, I have no idea. @mheon ?)
There was a problem hiding this comment.
Well grsecurity is not upstream linux, we target upstream linux IMO. I have little interested in maintaining kernel compatibility for third party patch sets.
Buildah 1.44 has a new regression that was caused by me removing a bunch
of init() code out of buildah. So that the binary no longer has to do so
much extra work on each reexec.
That has however exposed a storage bug in our chrootarchive logic. The
unshare.IsRootless() calls needs access to /proc and that is not there
once we call chroot(). So the only way for IsRootless() to work is if we
execute before. Then all callers later will get the cached result which
no longer needs any /proc access.
The exact error here is because while InUserNS is part of the TarOptions
struct and passed down from the top level we end up calling
GetOverlayXattrName("opaque") to get the xattr name and that does not
have access to TarOptions and just calls IsRootless() again.
And because that failed it assumed we run as root and used the wrong
xattr which then during writing caused a EPERM and thus failed the tar
extraction.
Now we could try to fix the code path and pass the variable down the
stack but that seems a lot work work so I opted to just fix this by
calling IsRootless() before we chroot to cache the right value.
Small buildah reproducer:
tmp=$(mktemp -d)
unshare --map-root-user --map-auto --mount -- bin/buildah \
--root $tmp/root --runroot $tmp/runroot pull \
quay.io/devfile/python:slim@sha256:54924a2ee4a2ef17028ae076ce38e59b3f4054353a5c9f9318dfaee60377532c
I guess any image that uses an opaque directory should work.
Fixes: podman-container-tools/buildah#6903
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
It does more setup we should have for all chroot calls. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Buildah 1.44 has a new regression that was caused by me removing a bunch
of init() code out of buildah. So that the binary no longer has to do so
much extra work on each reexec.
That has however exposed a storage bug in our chrootarchive logic. The
unshare.IsRootless() calls needs access to /proc and that is not there
once we call chroot(). So the only way for IsRootless() to work is if we
execute before. Then all callers later will get the cached result which
no longer needs any /proc access.
The exact error here is because while InUserNS is part of the TarOptions
struct and passed down from the top level we end up calling
GetOverlayXattrName("opaque") to get the xattr name and that does not
have access to TarOptions and just calls IsRootless() again.
And because that failed it assumed we run as root and used the wrong
xattr which then during writing caused a EPERM and thus failed the tar
extraction.
Now we could try to fix the code path and pass the variable down the
stack but that seems a lot work work so I opted to just fix this by
calling IsRootless() before we chroot to cache the right value.
Small buildah reproducer:
I guess any image that uses an opaque directory should work.
Fixes: podman-container-tools/buildah#6903