Skip to content

[Feature]: Memory-aware worker creation to prevent OOM #1160

@fi3ework

Description

@fi3ework

What problem does this feature solve?

Currently worker count is purely CPU-based (numCpus - 1 in run mode, numCpus / 2 in watch mode) with no memory awareness. On memory-constrained environments (CI containers, low-RAM machines), aggregate worker memory can exceed system capacity and cause OOM.

There are two OOM scenarios:

  1. System-level: Total RSS of all worker processes exceeds physical memory, OS kills process
  2. Main process heap: Concurrent in-flight IPC data from workers pushes main process V8 heap over its limit

Ref #1009.

Prerequisite: Replace tinypool with a custom pool that supports lazy worker creation (spawn on demand rather than eagerly forking all workers at startup).

What does the proposed API look like?

No new user-facing API. This is an internal behavioral change in the pool — zero configuration required, no behavioral change on machines with sufficient memory.

Core idea: maxWorkers stays unchanged (CPU-based). The pool checks available memory before spawning each new worker. If there's enough memory, spawn it; otherwise, queue the task and wait for an idle worker.

Data collection: Each worker reports process.memoryUsage().rss after completing a test file. The main process maintains a sliding window P90 as the estimated per-worker RSS. The main process also tracks heapUsed delta on each result arrival as the estimated per-result heap impact. All estimates are from runtime measurements — no hardcoded values.

Dual gate before worker creation:

  • Gate 1 — System memory: OS available memory (Linux MemAvailable, macOS os.freemem()) > measured per-worker RSS. Naturally accounts for other programs.
  • Gate 2 — Main process heap: Remaining heap headroom (v8.heapSizeLimit - heapUsed) can absorb one more concurrent result's heap impact.

New worker is only spawned when both gates pass.

Startup sequence:

  1. First worker is always created (single Node.js process is safe on any machine)
  2. After it completes and reports RSS, subsequent workers are created in batch — each going through dual gate
  3. Monitoring continues throughout the run, adjusting as estimates update

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions