Skip to content

[BUG: Breaking] #1032

@scotthuang1989

Description

@scotthuang1989

During conversion of a large/complex PDF (14568 layout blocks processed), Marker crashed after completing layout recognition, OCR error detection, and text recognition phases.
The error log shows an OSError: [Errno 12] Cannot allocate memory happening inside subprocess._execute_child → _fork_exec.
A warning about joblib failing to detect physical CPU cores due to memory allocation failure also appears.

Even though the machine has 500 GB of physical RAM, the main Marker process consumes a very large amount of memory (probably several hundred GB). When Marker later tries to spawn a subprocess via fork(), the system tries to copy the parent's memory space, instantly exhausting available memory.

📄 Input Document
[Attach the PDF file that triggered the error here]

📤 Output Trace / Stack Trace

Click to expand text Recognizing Layout: 100%|██████████| 14568/14568 [10:14:35<00:00, 2.53s/it] Running OCR Error Detection: 100%|██████████| 3642/3642 [02:58<00:00, 20.43it/s] Detecting bboxes: 100%|██████████| 27/27 [00:21<00:00, 1.26it/s] Recognizing Text: 100%|██████████| 2188/2188 [27:22<00:00, 1.33it/s] Recognizing Text: 100%|██████████| 153/153 [01:41<00:00, 1.51it/s] /home/hhw/py312env/lib/python3.12/site-packages/joblib/externals/loky/backend/context.py:131: UserWarning: Could not find the number of physical cores for the following reason: [Errno 12] Cannot allocate memory Returning the number of logical cores instead. You can silence this warning by setting LOKY_MAX_CPU_COUNT to the number of cores you want to use. warnings.warn( File "/home/hhw/py312env/lib/python3.12/site-packages/joblib/externals/loky/backend/context.py", line 245, in _count_physical_cores cpu_count_physical = _count_physical_cores_linux() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hhw/py312env/lib/python3.12/site-packages/joblib/externals/loky/backend/context.py", line 278, in _count_physical_cores_linux cpu_info = subprocess.run( ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/subprocess.py", line 548, in run with Popen(*popenargs, **kwargs) as process: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/subprocess.py", line 1026, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/usr/local/lib/python3.12/subprocess.py", line 1883, in _execute_child self.pid = _fork_exec( OSError: [Errno 12] Cannot allocate memory
⚙️ Environment Marker version: 1.10.2

Surya version: [Missing – run pip show surya-ocr]

Python version: 3.12 (virtual env at /home/hhw/py312env)

PyTorch version: [Missing – run pip show torch; note this is a CPU-only system]

Transformers version: [Missing – run pip show transformers]

Operating System: Linux sprhost 4.18.0-526.el8.x86_64 (RHEL 8 / CentOS 8-like), 2x Intel Xeon Platinum 8468V (192 logical cores, 2 sockets, NUMA)

✅ Expected Behavior
Marker should successfully convert the input PDF to Markdown without crashing due to memory allocation errors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug: breakingCrashes, errors, anything that stops execution or is runtime-breaking

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions