Description
I have been attempting to get rootless podman working inside of gVisor containers running on K3s. Unfortunately, I'm running into some errors which I believe is due to the way gVisor provides /proc/[pid]/ns/user. I am aware that rootful podman works but I need to support the rootless usage.
Currently, gVisor implements /proc/[pid]/ns/user via newFakeNamespaceSymlink, rendering as:
Unlike the other namespace entries, e.g. mnt, the user namespace entry is not useable as a valid target for setns and returns EINVAL when used.
This is a problem when attempting to run rootless podman inside of a gVisor container, since it relies on /proc/<pid>/ns/user being a valid and joinable user namespace FD during rootless lifecycle management. In particular, it attempts to call setns on it's pause pid, $XDG_RUNTIME_DIR/libpod/tmp/pause.pid: https://github.com/containers/podman/blob/5b263b5f5b48004a87caac44e67349a8266d9ef4/pkg/rootless/rootless_linux.c#L781
This manifests by the first rootless podman command running successfully, followed by the second failing:
~ $ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
~ $ podman ps
cannot set user namespace
If $XDG_RUNTIME_DIR/libpod/tmp/pause.pid is removed, podman <command> will work, requiring it to be removed again.
Apologies if I'm off the mark with my diagnosis here. I haven't had to delve this deep into container runtimes before :P
Steps to reproduce
You can reproduce this without podman by entering a gVisor container and running:
unshare -mr bash -c 'mount -t tmpfs none /mnt && touch /mnt/test && sleep 300' &
pid=$!
nsenter -t $pid -m true
nsenter -t $pid -U true
The first nsenter command will run successfully, whereas the second will fail with:
nsenter: reassociate to namespace 'ns/user' failed: Invalid argument
runsc version
runsc version release-20260520.0-42-gd3f3610be742-dirty
spec: 1.2.1
docker version (if using docker)
uname
6.12.0-211.16.1.el10_2.x86_64
(Yes SELinux is disabled in containerd for the nodes with gVisor runtime)
kubectl (if using Kubernetes)
Client Version: v1.36.0+k3s1
Server Version: v1.36.0+k3s1
Using embedded containerd with:
[plugins.'io.containerd.runtime.v1.linux']
shim_debug = true
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'runsc']
runtime_type = "io.containerd.runsc.v1"
snapshotter = "erofs"
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'runsc'.options]
TypeUrl = "io.containerd.runsc.v1.options"
ConfigPath = "/var/lib/rancher/k3s/agent/etc/containerd/runsc_options.toml"
SystemdCgroup = true
and runsc_options.toml:
log_path = "/var/log/runsc/%ID%/shim.log"
log_level = "info"
[runsc_config]
platform = "systrap"
systemd-cgroup = "true"
The podman version is 5.6.0 with buildah 1.41.8
repo state (if built from source)
Using the following nightly build:
runsc debug logs (if available)
Description
I have been attempting to get rootless podman working inside of gVisor containers running on K3s. Unfortunately, I'm running into some errors which I believe is due to the way gVisor provides
/proc/[pid]/ns/user. I am aware that rootful podman works but I need to support the rootless usage.Currently, gVisor implements
/proc/[pid]/ns/uservia newFakeNamespaceSymlink, rendering as:Unlike the other namespace entries, e.g.
mnt, theusernamespace entry is not useable as a valid target forsetnsand returnsEINVALwhen used.This is a problem when attempting to run rootless podman inside of a gVisor container, since it relies on
/proc/<pid>/ns/userbeing a valid and joinable user namespace FD during rootless lifecycle management. In particular, it attempts to callsetnson it's pause pid,$XDG_RUNTIME_DIR/libpod/tmp/pause.pid: https://github.com/containers/podman/blob/5b263b5f5b48004a87caac44e67349a8266d9ef4/pkg/rootless/rootless_linux.c#L781This manifests by the first rootless podman command running successfully, followed by the second failing:
If
$XDG_RUNTIME_DIR/libpod/tmp/pause.pidis removed,podman <command>will work, requiring it to be removed again.Apologies if I'm off the mark with my diagnosis here. I haven't had to delve this deep into container runtimes before :P
Steps to reproduce
You can reproduce this without podman by entering a gVisor container and running:
The first
nsentercommand will run successfully, whereas the second will fail with:nsenter: reassociate to namespace 'ns/user' failed: Invalid argumentrunsc version
docker version (if using docker)
uname
6.12.0-211.16.1.el10_2.x86_64
(Yes SELinux is disabled in containerd for the nodes with gVisor runtime)
kubectl (if using Kubernetes)
Using embedded containerd with:
and
runsc_options.toml:The podman version is
5.6.0with buildah1.41.8repo state (if built from source)
Using the following nightly build:
runsc debug logs (if available)