[BUG: Breaking]

During conversion of a large/complex PDF (14568 layout blocks processed), Marker crashed after completing layout recognition, OCR error detection, and text recognition phases.
The error log shows an OSError: [Errno 12] Cannot allocate memory happening inside subprocess._execute_child → _fork_exec.
A warning about joblib failing to detect physical CPU cores due to memory allocation failure also appears.

Even though the machine has 500 GB of physical RAM, the main Marker process consumes a very large amount of memory (probably several hundred GB). When Marker later tries to spawn a subprocess via fork(), the system tries to copy the parent's memory space, instantly exhausting available memory.

📄 Input Document
[Attach the PDF file that triggered the error here]

📤 Output Trace / Stack Trace
<details> <summary>Click to expand</summary>
text
Recognizing Layout: 100%|██████████| 14568/14568 [10:14:35<00:00,  2.53s/it]
Running OCR Error Detection: 100%|██████████| 3642/3642 [02:58<00:00, 20.43it/s]
Detecting bboxes: 100%|██████████| 27/27 [00:21<00:00,  1.26it/s]
Recognizing Text: 100%|██████████| 2188/2188 [27:22<00:00,  1.33it/s]
Recognizing Text: 100%|██████████| 153/153 [01:41<00:00,  1.51it/s]
/home/hhw/py312env/lib/python3.12/site-packages/joblib/externals/loky/backend/context.py:131: UserWarning: Could not find the number of physical cores for the following reason:
[Errno 12] Cannot allocate memory
Returning the number of logical cores instead. You can silence this warning by setting LOKY_MAX_CPU_COUNT to the number of cores you want to use.
  warnings.warn(
  File "/home/hhw/py312env/lib/python3.12/site-packages/joblib/externals/loky/backend/context.py", line 245, in _count_physical_cores
    cpu_count_physical = _count_physical_cores_linux()
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hhw/py312env/lib/python3.12/site-packages/joblib/externals/loky/backend/context.py", line 278, in _count_physical_cores_linux
    cpu_info = subprocess.run(
               ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/local/lib/python3.12/subprocess.py", line 1883, in _execute_child
    self.pid = _fork_exec(
OSError: [Errno 12] Cannot allocate memory
</details>
⚙️ Environment
Marker version: 1.10.2

Surya version: [Missing – run pip show surya-ocr]

Python version: 3.12 (virtual env at /home/hhw/py312env)

PyTorch version: [Missing – run pip show torch; note this is a CPU-only system]

Transformers version: [Missing – run pip show transformers]

Operating System: Linux sprhost 4.18.0-526.el8.x86_64 (RHEL 8 / CentOS 8-like), 2x Intel Xeon Platinum 8468V (192 logical cores, 2 sockets, NUMA)

✅ Expected Behavior
Marker should successfully convert the input PDF to Markdown without crashing due to memory allocation errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG: Breaking] #1032

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG: Breaking] #1032

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions