Skip to content

fix(cli): configure v8 isolate with cgroups-constrained memory limit #29078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

magurotuna
Copy link
Member

@magurotuna magurotuna commented Apr 28, 2025

This change configures V8 isolates to respect memory limits imposed by cgroups on Linux.

It adds support for detecting both cgroups v1 and v2 memory limits, enabling Deno to properly adapt to containerized environments with memory constraints. When cgroups information is unavailable or not applicable, it falls back to using the system's total memory as before.

Closes #29077

Test

For testing, I created a ubuntu VM with 1Gi memory. Within this VM, set up a cgroup with 512Mi memory limit, then ran the following script to see how much heap size limit the V8 isolate had.

import * as v8 from "node:v8";

console.log(v8.getHeapStatistics());

Ubuntu 20.04

In this version of ubuntu, hybrid mode is enabled by default.

$ cat /proc/self/cgroup
12:rdma:/
11:blkio:/user.slice
10:devices:/user.slice
9:cpu,cpuacct:/user.slice
8:pids:/user.slice/user-1000.slice/session-3.scope
7:memory:/user.slice/user-1000.slice/session-3.scope
6:perf_event:/
5:freezer:/
4:net_cls,net_prio:/
3:hugetlb:/
2:cpuset:/
1:name=systemd:/user.slice/user-1000.slice/session-3.scope
0::/user.slice/user-1000.slice/session-3.scope

Create a new cgroup with 512Mi memory limit and run the above script in this cgroup:

$ sudo cgcreate -g memory:/mygroup
$ sudo cgset -r memory.limit_in_bytes=$((512 * 1024 * 1024)) mygroup
$ sudo cgexec -g memory:mygroup ./deno run main.mjs
{
  total_heap_size: 7745536,
  total_heap_size_executable: 0,
  total_physical_size: 7090176,
  total_available_size: 266348216,
  used_heap_size: 6276752,
  heap_size_limit: 271581184,
  malloced_memory: 303200,
  peak_malloced_memory: 140456,
  does_zap_garbage: 0,
  number_of_native_contexts: 1,
  number_of_detached_contexts: 0,
  total_global_handles_size: 24576,
  used_global_handles_size: 22432,
  external_memory: 3232012
}

This indicates that the isolate was informed of cgroup-constrained memory limit (512Mi) and hence got ~270M heap limit.

Ubuntu 22.04

In this version of ubuntu, cgroup v2 is used.

$ cat /proc/self/cgroup
0::/user.slice/user-1000.slice/session-3.scope

Run the above script using systemd-run:

$ sudo systemd-run --property=MemoryMax=512M --pty bash -c '/home/ubuntu/deno run /home/ubuntu/main.mjs'
{
  total_heap_size: 7745536,
  total_heap_size_executable: 0,
  total_physical_size: 7090176,
  total_available_size: 266348184,
  used_heap_size: 6276784,
  heap_size_limit: 271581184,
  malloced_memory: 303200,
  peak_malloced_memory: 140456,
  does_zap_garbage: 0,
  number_of_native_contexts: 1,
  number_of_detached_contexts: 0,
  total_global_handles_size: 24576,
  used_global_handles_size: 22432,
  external_memory: 3232012
}

Again the isolate got ~270M heap limit properly.
Note that it should have had bigger heap limit if the entire system memory, i.e. 1Gi, had been passed to V8. In fact, if we run the same script outside the cgroup, it does display larger heap_size_limit like below:

$ ./deno run main.mjs
{
  total_heap_size: 7745536,
  total_heap_size_executable: 0,
  total_physical_size: 7090176,
  total_available_size: 546580152,
  used_heap_size: 6276752,
  heap_size_limit: 551813120,
  malloced_memory: 303200,
  peak_malloced_memory: 140456,
  does_zap_garbage: 0,
  number_of_native_contexts: 1,
  number_of_detached_contexts: 0,
  total_global_handles_size: 24576,
  used_global_handles_size: 22432,
  external_memory: 3232012
}

This change configures V8 isolates to respect memory limits imposed by
cgroups on Linux. It adds support for detecting both cgroups v1 and v2
memory limits, enabling Deno to properly adapt to containerized
environments with memory constraints. When cgroups information is
unavailable or not applicable, it falls back to using the system's
total memory as before.
@magurotuna magurotuna force-pushed the tell-v8-cgroup-constrained-memory-limit branch from 8ed9d12 to 895d941 Compare April 29, 2025 07:48
@magurotuna magurotuna marked this pull request as ready for review April 30, 2025 08:54
@magurotuna magurotuna requested a review from Copilot April 30, 2025 08:54
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR configures V8 isolates to honor memory limits imposed by cgroups on Linux by detecting cgroup v1 and v2 configurations. Key changes include:

  • Adding a new Linux-specific logic branch in create_isolate_create_params to use cgroup memory limits.
  • Introducing a helper module with functions to parse /proc/self/cgroup and determine the correct memory limit.
  • Including tests to validate the cgroup parsing function for both v1 and v2 (and hybrid mode).

magurotuna and others added 3 commits April 30, 2025 23:37
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Yusuke Tanaka <[email protected]>
Copy link
Member

@devsnek devsnek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally seems good, I am curious though if there's some crate for parsing the cgroups instead of manually implementing it.

Copy link
Member

@littledivy littledivy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any impact on startup perf? Can we this be feature flagged at compile/runtime instead?

@bartlomieju
Copy link
Member

Is there any impact on startup perf? Can we this be feature flagged at compile/runtime instead?

That's a good point, it won't be used in 99% cases - maybe enable it only via env var

@magurotuna
Copy link
Member Author

I am curious though if there's some crate for parsing the cgroups instead of manually implementing it.

Looks like this crate https://docs.rs/cgroups-rs/0.3.4/cgroups_rs/index.html provides what we want. I'll replace the manual implementation with it.

maybe enable it only via env var

OK, let's do that

@pat-s
Copy link

pat-s commented May 12, 2025

Will this fix be a possible solution for quarto-dev/quarto-cli#4197?

@magurotuna
Copy link
Member Author

@pat-s If the linked issue is about heap size limit (like heap out of memory runtime error) in container environments such as docker or k8s, yes this patch can remedy it.

@magurotuna
Copy link
Member Author

magurotuna commented May 13, 2025

I started working on replacing manual cgroup file parsing with existing libraries (procfs + cgroup-rs), and while reading the doc a thought came to my mind that parsing with these crates would introduce unnecessary overhead because the code of these crates didn't really seem to be written with performance in mind + they just do more jobs than we need.

So here's a quick benchmark, where we have:

  1. manual_impl.rs which extracts the manual parse implementation from this PR
  2. cgroups_rs.rs which does effectively the same thing using procfs + cgroups-rs

and compared the performance with hyperfine.

Click here to see manual_impl.rs

manual_impl.rs

fn get_memory_limit() -> Option<u64> {
    let Ok(self_cgroup) = std::fs::read_to_string("/proc/self/cgroup") else {
        return None;
    };

    match parse_self_cgroup(&self_cgroup) {
        CgroupVersion::V1 { cgroup_relpath } => {
            let limit_path = std::path::Path::new("/sys/fs/cgroup/memory")
                .join(cgroup_relpath)
                .join("memory.limit_in_bytes");
            std::fs::read_to_string(limit_path)
                .ok()
                .and_then(|s| s.trim().parse::<u64>().ok())
        }
        CgroupVersion::V2 { cgroup_relpath } => {
            let limit_path = std::path::Path::new("/sys/fs/cgroup")
                .join(cgroup_relpath)
                .join("memory.max");
            std::fs::read_to_string(limit_path)
                .ok()
                .and_then(|s| s.trim().parse::<u64>().ok())
        }
        CgroupVersion::None => None,
    }
}

enum CgroupVersion<'a> {
    V1 { cgroup_relpath: &'a str },
    V2 { cgroup_relpath: &'a str },
    None,
}

fn parse_self_cgroup(self_cgroup_content: &str) -> CgroupVersion<'_> {
    // Initialize the cgroup version as None. This will be updated based on the parsed lines.
    let mut cgroup_version = CgroupVersion::None;

    // Iterate through each line in the cgroup content. Each line represents a cgroup entry.
    for line in self_cgroup_content.lines() {
        // Split the line into parts using ":" as the delimiter. The format is typically:
        // "<hierarchy_id>:<subsystems>:<cgroup_path>"
        let split = line.split(":").collect::<Vec<_>>();

        match &split[..] {
            // If the line specifies "memory" as the subsystem, it indicates cgroup v1 is used
            // for memory management. Extract the relative path and update the cgroup version.
            [_, "memory", cgroup_v1_relpath] => {
                cgroup_version = CgroupVersion::V1 {
                    cgroup_relpath: cgroup_v1_relpath
                        .strip_prefix("/")
                        .unwrap_or(cgroup_v1_relpath),
                };
                // Break early since v1 explicitly manages memory, and no further checks are needed.
                break;
            }
            // If the line starts with "0::", it indicates cgroup v2 is used. However, in hybrid
            // mode, memory might still be managed by v1. Continue checking other lines to confirm.
            ["0", "", cgroup_v2_relpath] => {
                cgroup_version = CgroupVersion::V2 {
                    cgroup_relpath: cgroup_v2_relpath
                        .strip_prefix("/")
                        .unwrap_or(cgroup_v2_relpath),
                };
            }
            _ => {}
        }
    }

    cgroup_version
}

fn main() {
    for _ in 0..10_000 {
        let limit = get_memory_limit();
        println!("Memory limit: {:?}", limit);
    }
}
Click here to view cgroups_rs.rs

cgroups_rs.rs

use cgroups_rs::{hierarchies, MaxValue};
use cgroups_rs::{memory::MemController, Cgroup};
use procfs::process::Process;

fn get_memory_limit() -> Option<MaxValue> {
    let myself = Process::myself().unwrap();
    let cgroups_entries = myself.cgroups().unwrap().0;

    let mut maybe_v2_limit = None;

    for entry in cgroups_entries {
        // V2
        if entry.controllers.is_empty() {
            let relpath = entry.pathname.trim_start_matches('/');
            let hier_v2 = hierarchies::V2::new();
            let cg = Cgroup::load(Box::new(hier_v2), relpath);
            let mem_controller = cg.controller_of::<MemController>().unwrap();
            maybe_v2_limit = mem_controller.get_mem().unwrap().max;
        } else if entry.controllers.contains(&"memory".to_string()) {
            let relpath = entry.pathname.trim_start_matches('/');
            let hier_v1 = hierarchies::V1::new();
            let cg = Cgroup::load(Box::new(hier_v1), relpath);
            let mem_controller = cg.controller_of::<MemController>().unwrap();
            return mem_controller.get_mem().unwrap().max;
        }
    }

    maybe_v2_limit
}

fn main() {
    for _ in 0..10_000 {
        let limit = get_memory_limit();
        println!("Memory limit: {:?}", limit);
    }
}

Result

➜ hyperfine --warmup 3 './target/release/manual_impl > /dev/null' './target/release/cgroups_rs > /dev/null'
Benchmark 1: ./target/release/manual_impl > /dev/null
  Time (mean ± σ):      54.2 ms ±   1.8 ms    [User: 8.3 ms, System: 44.2 ms]
  Range (min … max):    51.7 ms …  62.2 ms    53 runs

Benchmark 2: ./target/release/cgroups_rs > /dev/null
  Time (mean ± σ):      1.040 s ±  0.014 s    [User: 0.851 s, System: 0.189 s]
  Range (min … max):    1.017 s …  1.063 s    10 runs

Summary
  ./target/release/manual_impl > /dev/null ran
   19.19 ± 0.70 times faster than ./target/release/cgroups_rs > /dev/null

So as I had guessed, parsing with the crates brought kind of unacceptable overhead. Let's probably keep the manual implementation as is?

@bartlomieju
Copy link
Member

@magurotuna that seems reasonable, though I would still suggest to only parse cgroups if an env var is present (eg. DENO_USE_CGROUPS), do you think that's possible?

@magurotuna
Copy link
Member Author

@bartlomieju Yes for sure. I am working on that part now

@magurotuna
Copy link
Member Author

DENO_USE_CGROUPS is added d5ffc7f

@magurotuna magurotuna merged commit 9b2b1c4 into denoland:main May 13, 2025
18 checks passed
@magurotuna magurotuna deleted the tell-v8-cgroup-constrained-memory-limit branch May 13, 2025 15:08
@felipecrs
Copy link

This is super useful in docker, kubernetes, etc. I think it's a bummer having it behind an environment variable.

Can't some quicker lookup be done to decide whether cgroups should be looked into or not? Like checking the presence of some file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cgroups memory limit is not taken into account when configuring V8 heap
6 participants