Skip to content

rootfsParentMountPrivate fails with EPERM in user namespace #5241

@lentil1016

Description

@lentil1016

Description

Description

When running a container with a user namespace (e.g., via OCI spec with linux.namespaces containing user type),
runc create fails with:

unable to start container process: error during container init: error preparing rootfs: remount-private
dst=/run/containerd/.../rootfs, flags=MS_PRIVATE: operation not permitted

Root Cause

rootfsParentMountPrivate() in libcontainer/rootfs_linux.go calls mount("", path, "", MS_PRIVATE, "") on the rootfs
parent mount. In a user namespace, mounts inherited from a more privileged mount namespace are "locked" by the kernel,
and any propagation type change is rejected with EPERM.

Why It Is Safe to Skip

prepareRoot() has already called mount("", "/", "", MS_SLAVE|MS_REC, "") before rootfsParentMountPrivate().
MS_SLAVE is sufficient:

  • pivot_root() succeeds (parent mount is not shared)
  • Mount events do not propagate from the container to the parent namespace
  • MS_PRIVATE is defense-in-depth on top of MS_SLAVE, redundant in this context

Proposed Fix

Skip EPERM in rootfsParentMountPrivate() when running inside a user namespace:

if err == unix.EPERM && userns.RunningInUserNS() {
    return nil
}

Steps to reproduce the issue

Requires: root, kernel 4.x+, runc built from main branch.

# 1. Create an OCI bundle with busybox rootfs
mkdir -p /tmp/userns-test/rootfs
cd /tmp/userns-test
docker export $(docker create busybox) | tar -C rootfs -xf -
runc spec

# 2. Add user namespace and UID/GID mappings to config.json
jq '.linux.namespaces += [{"type": "user"}]
    | .linux.uidMappings = [{"containerID": 0, "hostID": 100000, "size": 65536}]
    | .linux.gidMappings = [{"containerID": 0, "hostID": 100000, "size": 65536}]
    | .linux.rootfsPropagation = "rslave"
    | .linux.devices = []
    | .process.args = ["id"]
    | .process.terminal = false' config.json > config.json.tmp && mv config.json.tmp config.json

# 3. Remap rootfs ownership to mapped UID range
chown -R 100000:100000 rootfs/

# 4. Make the parent mount shared (simulates containerd environment)
mount --make-rshared /

# 5. Run the container — this will fail with EPERM
runc run test-userns

Expected: container runs and prints uid=0(root) gid=0(root)

Describe the results you received and expected

runc create failed: unable to start container process: error during container init:
  error preparing rootfs: remount-private dst=/run/containerd/.../rootfs,
  flags=MS_PRIVATE: operation not permitted

What version of runc are you using?

runc: v1.5.0

Host OS information

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Host kernel information

Kernel: 5.15 (also reproducible on 4.18+)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions