Bug
Docling crashes with SIGABRT (exit code 134) when calling converter.convert() on Python 3.12 in Databricks Serverless (Standard v5 runtime). The crash occurs on even a trivially small PDF (0.4 MB, ~10 pages). The same code works perfectly on Python 3.10 (Standard v1 runtime) with identical configuration.
The kernel is killed immediately after the StandardPdfPipeline initializes its plugins (layout, OCR, table structure engines all register successfully), but before any page processing begins. The error is a native-level SIGABRT, not a Python exception — suggesting an assertion failure inside a C/C++ extension (docling-parse or deepsearch-glm).
Last log output before crash:
Initializing pipeline for StandardPdfPipeline with options hash 0c22bedbf11f2ec242f5408ec6e34dfb
Loading plugin 'docling_defaults'
Registered picture descriptions: ['picture_description_vlm_engine', 'vlm', 'api']
Registered ocr engines: ['auto', 'easyocr', 'ocrmac', 'rapidocr', 'tesserocr', 'tesseract']
Registered layout engines: ['layout_object_detection', 'docling_layout_default', 'docling_experimental_table_crops_layout']
Accelerator device: 'cpu'
Registered table structure engines: ['docling_tableformer', 'docling_tableformer_v2']
Then:
The Python process exited with exit code 134 (SIGABRT: Aborted).
We are currently forced to pin our Databricks job environment to Python 3.10, which Databricks has deprecated as the default. Standard v5 (Python 3.12) is now the default serverless runtime, and Standard v1 (Python 3.10) will be retired. This makes docling unusable on the standard Databricks serverless platform for PDF processing.
We'd appreciate guidance on:
- Is Python 3.12 on Linux officially supported/tested?
- Are there known incompatibilities with
docling-parse native wheels on certain glibc versions?
- Is there a specific version combination that's known to work on 3.12?
Steps to reproduce
- Create a Databricks Serverless notebook on Standard v5 (Python 3.12)
- Install docling:
%pip install setuptools>=70.0 docling>=2.73.1
- Restart kernel:
dbutils.library.restartPython()
- Run:
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat
pdf_options = PdfPipelineOptions()
format_options = {InputFormat.PDF: PdfFormatOption(pipeline_options=pdf_options)}
converter = DocumentConverter(format_options=format_options)
# This line causes SIGABRT — even on any small PDF
result = converter.convert("/path/to/any_small.pdf")
- Kernel crashes with exit code 134 (SIGABRT)
Notes:
- DocumentConverter initialization succeeds (0.09s) — crash is during
convert()
- OCR is disabled (
do_ocr: False)
- Crash is not related to file size — happens on a 0.4 MB, ~10 page PDF
- Same file processes successfully on Python 3.10 (Standard v1) with identical code
- All other native packages (torch, scipy, sklearn, rdkit, lxml) load and run fine on 3.12
- Tested with
docling>=2.61.2, docling>=2.73.1, and latest — all crash identically
Docling version
Tested with:
docling>=2.61.2 (resolved to 2.61.2)
docling>=2.73.1 (resolved to 2.73.1)
docling latest as of 2026-03-28
All produce identical SIGABRT on Python 3.12.
Python version
Crashes on:
Python 3.12.x (Databricks Serverless Standard v5, Linux)
Works on:
Python 3.10.x (Databricks Serverless Standard v1, Linux)
Bug
Docling crashes with
SIGABRT (exit code 134)when callingconverter.convert()on Python 3.12 in Databricks Serverless (Standard v5 runtime). The crash occurs on even a trivially small PDF (0.4 MB, ~10 pages). The same code works perfectly on Python 3.10 (Standard v1 runtime) with identical configuration.The kernel is killed immediately after the
StandardPdfPipelineinitializes its plugins (layout, OCR, table structure engines all register successfully), but before any page processing begins. The error is a native-level SIGABRT, not a Python exception — suggesting an assertion failure inside a C/C++ extension (docling-parseordeepsearch-glm).Last log output before crash:
Then:
We are currently forced to pin our Databricks job environment to Python 3.10, which Databricks has deprecated as the default. Standard v5 (Python 3.12) is now the default serverless runtime, and Standard v1 (Python 3.10) will be retired. This makes docling unusable on the standard Databricks serverless platform for PDF processing.
We'd appreciate guidance on:
docling-parsenative wheels on certain glibc versions?Steps to reproduce
Notes:
convert()do_ocr: False)docling>=2.61.2,docling>=2.73.1, and latest — all crash identicallyDocling version
Tested with:
docling>=2.61.2(resolved to 2.61.2)docling>=2.73.1(resolved to 2.73.1)doclinglatest as of 2026-03-28All produce identical SIGABRT on Python 3.12.
Python version
Crashes on:
Works on: