Skip to content

deployment vcluster KO in Kubernetes with noexec for emptyDir #1717

Open
@antoinetran

Description

@antoinetran

What happened?

In an environment where any emptyDir is mounted to a partition in host, with noexec, vcluster create will give:

12:07:17 warn Pod my-vcluster-795748b48b-gzbvb: Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "
/binaries/vcluster": permission denied: unknown (Failed)

After editing the pod for debug with strace:

/ # /binaries/vcluster
sh: /binaries/vcluster: Permission denied
/ # strace /binaries/vcluster
execve("/binaries/vcluster", ["/binaries/vcluster"], [/* 27 vars */]) = -1 EACCES (Permission denied)
writev(2, [{iov_base="strace: exec: Permission denied", iov_len=31}, {iov_base="\n", iov_len=1}], 2strace: exec: Permission denied
) = 32
writev(2, [{iov_base="", iov_len=0}, {iov_base=NULL, iov_len=0}], 2) = 0
getpid()                                = 18
exit_group(1)                           = ?
+++ exited with 1 +++

If copied to /tmp, vcluster works.

Mount command gives:

# for /tmp
mount | grep "on / "
overlay on / type overlay (rw,seclabel,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/26481/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/26480/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/26558/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/26558/work)

# for /binaries
mount | grep "on /binaries "
/dev/sda7 on /binaries type ext4 (rw,seclabel,nosuid,nodev,noexec,relatime,stripe=64)

Which shows noexec for /binaries (but not for /tmp though).

What did you expect to happen?

vcluster create is OK

How can we reproduce it (as minimally and precisely as possible)?

Deploy a kubernetes cluster and configures it to bind any emptyDir to a partition with noexec. Then deploy vcluster.

Anything else we need to know?

Currently, it seems this behavior is particular to the Kubernetes environment I am deploying it into. Generally speaking, it seems the emptyDir are not mounted as noexec. However seeing kubernetes/kubernetes#48912 , it seems we are going in the direction of more security with emptyDir mounted as noexec (by default or with options).

From my understanding of the code (see https://github.com/loft-sh/vcluster/blob/v0.20.0-beta.1/chart/templates/_init-containers.tpl), the initContainers are here to inject vcluster, only to do a cp command (because the cp is not present in the kubernetes images), to get kube-controller-manager and kube-apiserver binaries into vcluster image. This needs emptyDir mounted as exec.

Host cluster Kubernetes version

$ kubectl version
kubectl version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.26.4

Host cluster Kubernetes distribution

kubespray

vlcuster version

$ vcluster --version
vcluster version 0.20.0-beta.1

Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)

k8s

OS and Arch

OS:  Linux
Arch: amd64

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions