-
Notifications
You must be signed in to change notification settings - Fork 79
Description
Current behavior
We noticed that with the switch to the new Noble stemcell (cgroups v2) for our cf-deployment, containers created through garden-runc-release's default containerd-mode no longer seem to expose the container's cgroups through /sys/fs/cgroup, as was the case when using the Jammy stemcell (cgroups v1). The cgroups per container are still available on the host, with /proc/${PID}/cgroup inside the container pointing at the path on the host.
However, with the java-buildpack for instance, the JVM looks at the cgroups to determine max memory/CPU, and on Noble it now falls back to the host diego-cell's total memory/CPU values, unaware of the shares assigned to the instance itself through cgroups due to them not being exposed inside the container.
Using a simple RuntimeInfo class showing the difference on the java buildpack (4.77.0), on a 8 core/64GB machine:
Jammy
$ mount|grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755,uid=4294967294,gid=4294967294,inode64)
cgroup on /sys/fs/cgroup/systemd type cgroup (ro,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/cpuset type cgroup (ro,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/freezer type cgroup (ro,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/rdma type cgroup (ro,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (ro,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (ro,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/misc type cgroup (ro,nosuid,nodev,noexec,relatime,misc)
cgroup on /sys/fs/cgroup/perf_event type cgroup (ro,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/blkio type cgroup (ro,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/pids type cgroup (ro,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (ro,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/devices type cgroup (ro,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/memory type cgroup (ro,nosuid,nodev,noexec,relatime,memory)
$ ls /sys/fs/cgroup/
blkio cpu cpuacct cpu,cpuacct cpuset devices freezer hugetlb memory misc net_cls net_cls,net_prio net_prio perf_event pids rdma systemd
$ /home/vcap/app/.java-buildpack/open_jdk_jre/bin/java RuntimeInfo
availableProcessors: 8
freeMemory: 15.9 MB
maxMemory: 255.3 MB
totalMemory: 17.4 MB
Checking OperatingSystemMXBean
OperatingSystemMXBean.getAvailableProcessors: 8
OperatingSystemMXBean.getTotalPhysicalMemorySize: 1.0 GB
OperatingSystemMXBean.getFreePhysicalMemorySize: 652.0 MB
OperatingSystemMXBean.getTotalSwapSpaceSize: 0 B
OperatingSystemMXBean.getFreeSwapSpaceSize: 0 B
OperatingSystemMXBean.getSystemCpuLoad: 0.000000
Noble
$ mount|grep cgroup
$ ls /sys/fs/cgroup/
$ /home/vcap/app/.java-buildpack/open_jdk_jre/bin/java RuntimeInfo
availableProcessors: 8
freeMemory: 1014.2 MB
maxMemory: 15.7 GB
totalMemory: 1.0 GB
Checking OperatingSystemMXBean
OperatingSystemMXBean.getAvailableProcessors: 8
OperatingSystemMXBean.getTotalPhysicalMemorySize: 62.8 GB
OperatingSystemMXBean.getFreePhysicalMemorySize: 10.0 GB
OperatingSystemMXBean.getTotalSwapSpaceSize: 62.8 GB
OperatingSystemMXBean.getFreeSwapSpaceSize: 62.3 GB
OperatingSystemMXBean.getSystemCpuLoad: 0.000000
Regarding CPU matching the host: "Note: In versions 18.0.2+, 17.0.5+ and 11.0.17+, OpenJDK will no longer take CPU shares settings into account for its calculation of available CPU cores. See JDK-8281181 for details."
The containerd docs seem to recommend using the systemd cgroup driver (SystemdCgroup = true), but manually adding this to containerd.toml and restarting garden doesn't appear to change the behaviour.
First reported through slack.
Desired behavior
Exposing, perhaps optionally/configurable, the cgroups pseudo file system through /sys/fs/cgroup inside the container itself.
Affected Version
1.75.0
Metadata
Metadata
Assignees
Labels
Type
Projects
Status