Skip to content

SIGABRT crash on Python 3.12 — all docling versions fail on Databricks Serverless #3201

@Aviral0

Description

@Aviral0

Bug

Docling crashes with SIGABRT (exit code 134) when calling converter.convert() on Python 3.12 in Databricks Serverless (Standard v5 runtime). The crash occurs on even a trivially small PDF (0.4 MB, ~10 pages). The same code works perfectly on Python 3.10 (Standard v1 runtime) with identical configuration.

The kernel is killed immediately after the StandardPdfPipeline initializes its plugins (layout, OCR, table structure engines all register successfully), but before any page processing begins. The error is a native-level SIGABRT, not a Python exception — suggesting an assertion failure inside a C/C++ extension (docling-parse or deepsearch-glm).

Last log output before crash:

Initializing pipeline for StandardPdfPipeline with options hash 0c22bedbf11f2ec242f5408ec6e34dfb
Loading plugin 'docling_defaults'
Registered picture descriptions: ['picture_description_vlm_engine', 'vlm', 'api']
Registered ocr engines: ['auto', 'easyocr', 'ocrmac', 'rapidocr', 'tesserocr', 'tesseract']
Registered layout engines: ['layout_object_detection', 'docling_layout_default', 'docling_experimental_table_crops_layout']
Accelerator device: 'cpu'
Registered table structure engines: ['docling_tableformer', 'docling_tableformer_v2']

Then:

The Python process exited with exit code 134 (SIGABRT: Aborted).

We are currently forced to pin our Databricks job environment to Python 3.10, which Databricks has deprecated as the default. Standard v5 (Python 3.12) is now the default serverless runtime, and Standard v1 (Python 3.10) will be retired. This makes docling unusable on the standard Databricks serverless platform for PDF processing.

We'd appreciate guidance on:

  1. Is Python 3.12 on Linux officially supported/tested?
  2. Are there known incompatibilities with docling-parse native wheels on certain glibc versions?
  3. Is there a specific version combination that's known to work on 3.12?

Steps to reproduce

  1. Create a Databricks Serverless notebook on Standard v5 (Python 3.12)
  2. Install docling:
    %pip install setuptools>=70.0 docling>=2.73.1
  3. Restart kernel:
    dbutils.library.restartPython()
  4. Run:
    from docling.document_converter import DocumentConverter, PdfFormatOption
    from docling.datamodel.pipeline_options import PdfPipelineOptions
    from docling.datamodel.base_models import InputFormat
    
    pdf_options = PdfPipelineOptions()
    format_options = {InputFormat.PDF: PdfFormatOption(pipeline_options=pdf_options)}
    converter = DocumentConverter(format_options=format_options)
    
    # This line causes SIGABRT — even on any small PDF
    result = converter.convert("/path/to/any_small.pdf")
  5. Kernel crashes with exit code 134 (SIGABRT)

Notes:

  • DocumentConverter initialization succeeds (0.09s) — crash is during convert()
  • OCR is disabled (do_ocr: False)
  • Crash is not related to file size — happens on a 0.4 MB, ~10 page PDF
  • Same file processes successfully on Python 3.10 (Standard v1) with identical code
  • All other native packages (torch, scipy, sklearn, rdkit, lxml) load and run fine on 3.12
  • Tested with docling>=2.61.2, docling>=2.73.1, and latest — all crash identically

Docling version

Tested with:

  • docling>=2.61.2 (resolved to 2.61.2)
  • docling>=2.73.1 (resolved to 2.73.1)
  • docling latest as of 2026-03-28

All produce identical SIGABRT on Python 3.12.

Python version

Crashes on:

Python 3.12.x (Databricks Serverless Standard v5, Linux)

Works on:

Python 3.10.x (Databricks Serverless Standard v1, Linux)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions