fix: nspawn_host_dev - restore /dev/pts and /dev/shm after devtmpfs remount#1749
fix: nspawn_host_dev - restore /dev/pts and /dev/shm after devtmpfs remount#1749nikromen wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the nspawn_host_dev configuration option, allowing Mock to remount /dev as devtmpfs within systemd-nspawn containers to support image-building tools like kiwi and osbuild. Review feedback indicates that the current implementation of remounting /dev will mask essential submounts such as /dev/pts and /dev/shm, which must be manually restored to maintain PTY and shared memory functionality. Furthermore, an inconsistency was noted where the mount command is attempted in bootstrap chroots even when the necessary privileges are not granted, suggesting a need for better synchronization between the configuration and the execution logic.
| @@ -0,0 +1,17 @@ | |||
| #!/bin/sh | |||
|
|
|||
| . ${TESTDIR}/functions | |||
| @@ -0,0 +1,17 @@ | |||
| #!/bin/sh | |||
|
|
|||
| . ${TESTDIR}/functions | |||
|
/gemini review |
|
@praiskup @Conan-Kudo This follows the same approach as osbuild/image-builder Does this approach work for you? |
There was a problem hiding this comment.
Code Review
This pull request introduces the nspawn_host_dev configuration option, which allows systemd-nspawn containers to access dynamically created loop partition devices by remounting /dev as devtmpfs. While the changes include documentation and integration tests, the review identified several critical issues: the remount logic will fail in non-privileged builds because it attempts to run mount commands after dropping root privileges; the check for the cap_sys_admin capability is fragile and may miss valid configurations; and an early return in the loop device setup could skip necessary host-side device creation.
| cmd = ['/bin/sh', '-c', | ||
| 'mount -t devtmpfs devtmpfs /dev' | ||
| ' && mount -t devpts devpts /dev/pts -o gid=5,mode=620' | ||
| ' && mount -t tmpfs tmpfs /dev/shm -o mode=1777;' | ||
| ' exec "$@"', | ||
| '--'] + cmd |
There was a problem hiding this comment.
The mount commands will fail when running a non-privileged build (the default for Mock). When systemd-nspawn is called with the -u option (lines 762-764), it switches to the specified user before executing the command. A non-root user cannot perform mount operations, even if the container has CAP_SYS_ADMIN, because the capability is not in their effective set. Since most Mock builds run as the mockbuild user, this feature will fail. To fix this, the remount must happen as root before dropping privileges. Alternatively, consider using systemd-nspawn --bind=/dev which performs the bind mount as root before switching the user.
There was a problem hiding this comment.
Good catch on the user context. For --shell and --chroot the
command runs as root, so the mount works. For rpmbuild it runs as
chrootuser , but kiwi/osbuild builds use --chroot) not rpmbuild, so the
primary use case is covered. Worth noting in docs though, so I should document this perhaps.
Ad the suggested alternatives: moving the remount before the user switch
would require hooking into nspawn's own startup sequence which would add significant complexity.
And --bind=/dev exposes all host devices and masks /dev/pts and /dev/shm
the same way, so it doesn't actually help.
| '--capability=cap_sys_admin', | ||
| '--property=DevicePolicy=auto', | ||
| ]) | ||
| return |
There was a problem hiding this comment.
By returning early here, the code skips the pre-creation of loop devices on the host (lines 877-883). While modern kernels create loop devices on demand, Mock explicitly manages dev_loop_count to ensure a specific number of devices are available. If nspawn_host_dev is enabled on a system that doesn't automatically create enough loop devices, this might lead to failures. Consider ensuring the host-side device creation still runs even when nspawn_host_dev is enabled.
There was a problem hiding this comment.
With nspawn_host_dev the whole point is that /dev is devtmpfs, which shows all
host devices including loop nodes created on demand via /dev/loop-control.
I think pre-creating them is unnecessary.
| if isinstance(cmd, str): | ||
| cmd = ['/bin/sh', '-c', cmd] | ||
|
|
||
| if USE_NSPAWN_HOST_DEV and nspawn_args and '--capability=cap_sys_admin' in nspawn_args: |
There was a problem hiding this comment.
The check for cap_sys_admin in nspawn_args is fragile because it only matches the exact string '--capability=cap_sys_admin'. If a user provides this capability in a different format (e.g., ['--capability', 'cap_sys_admin'] or as part of a comma-separated list), the remount logic will be skipped. Consider using a more robust check, such as any('cap_sys_admin' in arg for arg in nspawn_args if arg.startswith('--capability')).
There was a problem hiding this comment.
The capability string is generated by our own code in _setup_nspawn_loop_devices(), not user input as it will always be exactly --capability=cap_sys_admin.
There was a problem hiding this comment.
if we check for USE_NSPAWN_HOST_DEV, the option must be there in nspawn_args, or? no need to check for it twice?
|
This approach looks OK at a first read. I don't know when I'll have time to try this out but hopefully next week. Very cool if we can make this work 🙂 |
|
Anyone can verify if this fixes the problem by testing the reproducer in the ticket this PR claims to fix. |
Failed for me: |
| 'mount -t devtmpfs devtmpfs /dev' | ||
| ' && mount -t devpts devpts /dev/pts -o gid=5,mode=620' | ||
| ' && mount -t tmpfs tmpfs /dev/shm -o mode=1777;' | ||
| ' exec "$@"', |
There was a problem hiding this comment.
Hmm, this shell -> exec & set of mounts seems pretty dangerous. How do we umount?
There was a problem hiding this comment.
I don't think this is the way to go (we should be using some builtin mechanism for handling mountpoints 🤷, some hints anyway
- && should be used instead of
;beforeexec - please use
set -x - perhaps
set -einstead of&&
…emount systemd-nspawn sets up /dev as tmpfs, so dynamically created device nodes (e.g. /dev/loop0p1) never appear inside the container. This is a known kernel limitation. Block devices are not virtualized for containers (systemd/systemd#21987, Poettering on systemd-devel: https://lists.freedesktop.org/archives/systemd-devel/2017-August/039453.html). The nspawn_host_dev option works around this by remounting /dev as devtmpfs inside the container. Since devtmpfs is a single kernel-wide instance, all device nodes (including those created dynamically by) become visible. The osbuild/image-builder uses the same approach (see osbuild/image-builder-cli pkg/setup/setup.go and rpm-software-management#1554 discussion). When nspawn_host_dev is enabled: - CAP_SYS_ADMIN is granted and DevicePolicy set to auto. - /dev is remounted as devtmpfs before each command, then /dev/pts and /dev/shm are restored (the remount masks these submounts). - The remount is skipped in bootstrap chroots where CAP_SYS_ADMIN is not granted. Fixes: rpm-software-management#1554
|
@Conan-Kudo The reproducer in this issue runs a full kiwi image build, which can fail for many unrelated reasons (e.g. praiskup hit an SELinux setfiles error) since the reproducer is really broad and can fail with different causes. Could you provide a minimal reproducer that only tests the specific failure -- loop partition device visibility so I can verify against it? |
I could, but it would be disingenuous, because this bug needs to fix that use case. It isn't useful if we can't build a Fedora image on a Fedora host with it. If you cannot build an image, then this cannot be declared fixed.
Are you building on a Fedora host? Is |
systemd-nspawn sets up /dev as tmpfs, so dynamically created device
nodes (e.g. /dev/loop0p1) never appear inside the container.
This is a known kernel limitation. Block devices are not virtualized
for containers (systemd/systemd#21987, Poettering on systemd-devel:
https://lists.freedesktop.org/archives/systemd-devel/2017-August/039453.html).
The nspawn_host_dev option works around this by remounting /dev as
devtmpfs inside the container. Since devtmpfs is a single kernel-wide
instance, all device nodes (including those created dynamically by)
become visible. The osbuild/image-builder uses the same approach
(see osbuild/image-builder-cli pkg/setup/setup.go and #1554 discussion).
When nspawn_host_dev is enabled:
and /dev/shm are restored (the remount masks these submounts).
is not granted.
Fixes: #1554