Skip to content

/proc/[pid]/ns/user is not usable with setns #13314

@anthops

Description

@anthops

Description

I have been attempting to get rootless podman working inside of gVisor containers running on K3s. Unfortunately, I'm running into some errors which I believe is due to the way gVisor provides /proc/[pid]/ns/user. I am aware that rootful podman works but I need to support the rootless usage.

Currently, gVisor implements /proc/[pid]/ns/user via newFakeNamespaceSymlink, rendering as:

user:[<node>]

Unlike the other namespace entries, e.g. mnt, the user namespace entry is not useable as a valid target for setns and returns EINVAL when used.

This is a problem when attempting to run rootless podman inside of a gVisor container, since it relies on /proc/<pid>/ns/user being a valid and joinable user namespace FD during rootless lifecycle management. In particular, it attempts to call setns on it's pause pid, $XDG_RUNTIME_DIR/libpod/tmp/pause.pid: https://github.com/containers/podman/blob/5b263b5f5b48004a87caac44e67349a8266d9ef4/pkg/rootless/rootless_linux.c#L781

This manifests by the first rootless podman command running successfully, followed by the second failing:

~ $ podman ps
CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES
~ $ podman ps
cannot set user namespace

If $XDG_RUNTIME_DIR/libpod/tmp/pause.pid is removed, podman <command> will work, requiring it to be removed again.

Apologies if I'm off the mark with my diagnosis here. I haven't had to delve this deep into container runtimes before :P

Steps to reproduce

You can reproduce this without podman by entering a gVisor container and running:

unshare -mr bash -c 'mount -t tmpfs none /mnt && touch /mnt/test && sleep 300' &
pid=$!
nsenter -t $pid -m true
nsenter -t $pid -U true

The first nsenter command will run successfully, whereas the second will fail with:

nsenter: reassociate to namespace 'ns/user' failed: Invalid argument

runsc version

runsc version release-20260520.0-42-gd3f3610be742-dirty
spec: 1.2.1

docker version (if using docker)

uname

6.12.0-211.16.1.el10_2.x86_64

(Yes SELinux is disabled in containerd for the nodes with gVisor runtime)

kubectl (if using Kubernetes)

Client Version: v1.36.0+k3s1
Server Version: v1.36.0+k3s1

Using embedded containerd with:

[plugins.'io.containerd.runtime.v1.linux']
  shim_debug = true

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'runsc']
  runtime_type = "io.containerd.runsc.v1"
  snapshotter = "erofs"

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'runsc'.options]
  TypeUrl = "io.containerd.runsc.v1.options"
  ConfigPath = "/var/lib/rancher/k3s/agent/etc/containerd/runsc_options.toml"
  SystemdCgroup = true

and runsc_options.toml:

log_path = "/var/log/runsc/%ID%/shim.log"
log_level = "info"
[runsc_config]
  platform = "systrap"
  systemd-cgroup = "true"

The podman version is 5.6.0 with buildah 1.41.8

repo state (if built from source)

Using the following nightly build:

runsc debug logs (if available)

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions