During conversion of a large/complex PDF (14568 layout blocks processed), Marker crashed after completing layout recognition, OCR error detection, and text recognition phases.
The error log shows an OSError: [Errno 12] Cannot allocate memory happening inside subprocess._execute_child → _fork_exec.
A warning about joblib failing to detect physical CPU cores due to memory allocation failure also appears.
Even though the machine has 500 GB of physical RAM, the main Marker process consumes a very large amount of memory (probably several hundred GB). When Marker later tries to spawn a subprocess via fork(), the system tries to copy the parent's memory space, instantly exhausting available memory.
📄 Input Document
[Attach the PDF file that triggered the error here]
📤 Output Trace / Stack Trace
Click to expand
text
Recognizing Layout: 100%|██████████| 14568/14568 [10:14:35<00:00, 2.53s/it]
Running OCR Error Detection: 100%|██████████| 3642/3642 [02:58<00:00, 20.43it/s]
Detecting bboxes: 100%|██████████| 27/27 [00:21<00:00, 1.26it/s]
Recognizing Text: 100%|██████████| 2188/2188 [27:22<00:00, 1.33it/s]
Recognizing Text: 100%|██████████| 153/153 [01:41<00:00, 1.51it/s]
/home/hhw/py312env/lib/python3.12/site-packages/joblib/externals/loky/backend/context.py:131: UserWarning: Could not find the number of physical cores for the following reason:
[Errno 12] Cannot allocate memory
Returning the number of logical cores instead. You can silence this warning by setting LOKY_MAX_CPU_COUNT to the number of cores you want to use.
warnings.warn(
File "/home/hhw/py312env/lib/python3.12/site-packages/joblib/externals/loky/backend/context.py", line 245, in _count_physical_cores
cpu_count_physical = _count_physical_cores_linux()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hhw/py312env/lib/python3.12/site-packages/joblib/externals/loky/backend/context.py", line 278, in _count_physical_cores_linux
cpu_info = subprocess.run(
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/subprocess.py", line 548, in run
with Popen(*popenargs, **kwargs) as process:
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/subprocess.py", line 1026, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/lib/python3.12/subprocess.py", line 1883, in _execute_child
self.pid = _fork_exec(
OSError: [Errno 12] Cannot allocate memory
⚙️ Environment
Marker version: 1.10.2
Surya version: [Missing – run pip show surya-ocr]
Python version: 3.12 (virtual env at /home/hhw/py312env)
PyTorch version: [Missing – run pip show torch; note this is a CPU-only system]
Transformers version: [Missing – run pip show transformers]
Operating System: Linux sprhost 4.18.0-526.el8.x86_64 (RHEL 8 / CentOS 8-like), 2x Intel Xeon Platinum 8468V (192 logical cores, 2 sockets, NUMA)
✅ Expected Behavior
Marker should successfully convert the input PDF to Markdown without crashing due to memory allocation errors.
During conversion of a large/complex PDF (14568 layout blocks processed), Marker crashed after completing layout recognition, OCR error detection, and text recognition phases.
The error log shows an OSError: [Errno 12] Cannot allocate memory happening inside subprocess._execute_child → _fork_exec.
A warning about joblib failing to detect physical CPU cores due to memory allocation failure also appears.
Even though the machine has 500 GB of physical RAM, the main Marker process consumes a very large amount of memory (probably several hundred GB). When Marker later tries to spawn a subprocess via fork(), the system tries to copy the parent's memory space, instantly exhausting available memory.
📄 Input Document
[Attach the PDF file that triggered the error here]
📤 Output Trace / Stack Trace
Click to expand
text Recognizing Layout: 100%|██████████| 14568/14568 [10:14:35<00:00, 2.53s/it] Running OCR Error Detection: 100%|██████████| 3642/3642 [02:58<00:00, 20.43it/s] Detecting bboxes: 100%|██████████| 27/27 [00:21<00:00, 1.26it/s] Recognizing Text: 100%|██████████| 2188/2188 [27:22<00:00, 1.33it/s] Recognizing Text: 100%|██████████| 153/153 [01:41<00:00, 1.51it/s] /home/hhw/py312env/lib/python3.12/site-packages/joblib/externals/loky/backend/context.py:131: UserWarning: Could not find the number of physical cores for the following reason: [Errno 12] Cannot allocate memory Returning the number of logical cores instead. You can silence this warning by setting LOKY_MAX_CPU_COUNT to the number of cores you want to use. warnings.warn( File "/home/hhw/py312env/lib/python3.12/site-packages/joblib/externals/loky/backend/context.py", line 245, in _count_physical_cores cpu_count_physical = _count_physical_cores_linux() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hhw/py312env/lib/python3.12/site-packages/joblib/externals/loky/backend/context.py", line 278, in _count_physical_cores_linux cpu_info = subprocess.run( ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/subprocess.py", line 548, in run with Popen(*popenargs, **kwargs) as process: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/subprocess.py", line 1026, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/usr/local/lib/python3.12/subprocess.py", line 1883, in _execute_child self.pid = _fork_exec( OSError: [Errno 12] Cannot allocate memorySurya version: [Missing – run pip show surya-ocr]
Python version: 3.12 (virtual env at /home/hhw/py312env)
PyTorch version: [Missing – run pip show torch; note this is a CPU-only system]
Transformers version: [Missing – run pip show transformers]
Operating System: Linux sprhost 4.18.0-526.el8.x86_64 (RHEL 8 / CentOS 8-like), 2x Intel Xeon Platinum 8468V (192 logical cores, 2 sockets, NUMA)
✅ Expected Behavior
Marker should successfully convert the input PDF to Markdown without crashing due to memory allocation errors.