Skip to content

Issues with podman (unable to setup user: setgroups: invalid argument) #368

Open
@vsoch

Description

@vsoch

Hi @AkihiroSuda - I'm trying to get the setup working on an actual HPC cluster, and build the podman container as follows (it doesn't work with the podman compose build):

podman build --userns-uid-map=0:0:1 --userns-uid-map=1:1:1999 --userns-uid-map=65534:2000:2 -f ./Dockerfile -t usernetes_node .

That brings up the container OK, but then when I try to do kubeadm-init, it times out and this is what I see in the logs:

Mar 04 16:05:15 u7s-corona173 kubelet[1183]: E0304 16:05:15.675877    1183 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to setup user: setgroups: invalid argument: unknown" pod="kube-system/etcd-u7s-corona173"
Mar 04 16:05:15 u7s-corona173 kubelet[1183]: E0304 16:05:15.675911    1183 kuberuntime_manager.go:1166] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to setup user: setgroups: invalid argument: unknown" pod="kube-system/etcd-u7s-corona173"
Mar 04 16:05:15 u7s-corona173 kubelet[1183]: E0304 16:05:15.675971    1183 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"etcd-u7s-corona173_kube-system(5890a635964013b0836c119ab878b4ac)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"etcd-u7s-corona173_kube-system(5890a635964013b0836c119ab878b4ac)\\\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to setup user: setgroups: invalid argument: unknown\"" pod="kube-system/etcd-u7s-corona173" podUID="5890a635964013b0836c119ab878b4ac"
Mar 04 16:05:15 u7s-corona173 kubelet[1183]: E0304 16:05:15.971510    1183 event.go:368] "Unable to write event (may retry after sleeping)" err="Post \"https://u7s-corona173:6443/api/v1/namespaces/default/events\": dial tcp 10.100.171.100:6443: connect: connection refused" event="&Event{ObjectMeta:{u7s-corona173.1829a50dbdf86d87  default    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Node,Namespace:,Name:u7s-corona173,UID:u7s-corona173,APIVersion:,ResourceVersion:,FieldPath:,},Reason:Starting,Message:Starting kubelet.,Source:EventSource{Component:kubelet,Host:u7s-corona173,},FirstTimestamp:2025-03-04 16:03:29.395740039 +0000 UTC m=+0.316362060,LastTimestamp:2025-03-04 16:03:29.395740039 +0000 UTC m=+0.316362060,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:kubelet,ReportingInstance:u7s-corona173,}"
Mar 04 16:05:15 u7s-corona173 kubelet[1183]: E0304 16:05:15.971611    1183 event.go:307] "Unable to write event (retry limit exceeded!)" event="&Event{ObjectMeta:{u7s-corona173.1829a50dbdf86d87  default    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Node,Namespace:,Name:u7s-corona173,UID:u7s-corona173,APIVersion:,ResourceVersion:,FieldPath:,},Reason:Starting,Message:Starting kubelet.,Source:EventSource{Component:kubelet,Host:u7s-corona173,},FirstTimestamp:2025-03-04 16:03:29.395740039 +0000 UTC m=+0.316362060,LastTimestamp:2025-03-04 16:03:29.395740039 +0000 UTC m=+0.316362060,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:kubelet,ReportingInstance:u7s-corona173,}"
Mar 04 16:05:15 u7s-corona173 kubelet[1183]: E0304 16:05:15.971944    1183 event.go:368] "Unable to write event (may retry after sleeping)" err="Post \"https://u7s-corona173:6443/api/v1/namespaces/default/events\": dial tcp 10.100.171.100:6443: connect: connection refused" event="&Event{ObjectMeta:{u7s-corona173.1829a50dbe1ad4d4  default    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Node,Namespace:,Name:u7s-corona173,UID:u7s-corona173,APIVersion:,ResourceVersion:,FieldPath:,},Reason:InvalidDiskCapacity,Message:invalid capacity 0 on image filesystem,Source:EventSource{Component:kubelet,Host:u7s-corona173,},FirstTimestamp:2025-03-04 16:03:29.397994708 +0000 UTC m=+0.318616729,LastTimestamp:2025-03-04 16:03:29.397994708 +0000 UTC m=+0.318616729,Count:1,Type:Warning,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:kubelet,ReportingInstance:u7s-corona173,}"

Also note that the kernel is slightly old so I ignore preflight errors:

Linux u7s-corona173 4.18.0-553.34.1.1toss.t4.x86_64 #1 SMP Mon Jan 13 14:19:40 PST 2025 x86_64 GNU/Linux

I'm trying to distinguish what is an error with UID mapping from an issue that isn't resolvable because of the kernel version. I tried removing the kubernetes part and just doing a basic pull with crictl and got more insight to the "unknown" error:

Image

"lchown /var/lib/containerd/tmpmounts/containerd-mount192687706/home: invalid argument (Hint: try increasing the number of subordinate IDs in /etc/subuid and /etc/subgid): unknown"

Do you have any insights or suggestions? Sorry for asking for help so much, I feel a bit alone working on this. I also tested out nerdctl but seem to have less control there with uid mappings - I can only use --user and then I hit trouble for the higher IDs and it's non-trivial on a cluster with a ton of users to give me more ids. Thank you!

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions