NullPointerException in ListUtils.isContainsHeading during hybrid mode postProcess

### Bug

When using hybrid mode (`--hybrid docling-fast`, `--hybrid-mode auto`), the Java CLI crashes with a NullPointerException during the post-processing stage. The docling-fast backend processes pages successfully (returns 200 OK), but the Java process fails in `HybridDocumentProcessor.postProcess()` and produces no JSON output.

The crash occurs in the verapdf list processing code: same area as #134 but a different method (`ListUtils.isContainsHeading` vs `ListLabelsUtils.haveDifferentSuffixChars`). The defensive try-catch added in #134 does not cover this code path.

Stack trace:

SEVERE: Exception during processing file /tmp/tmp0i17_px2/chunk.pdf: null
java.lang.NullPointerException
    at org.verapdf.wcag.algorithms.semanticalgorithms.utils.ListUtils.isContainsHeading(ListUtils.java:211)
    at org.verapdf.wcag.algorithms.semanticalgorithms.utils.ListUtils.checkChildrenListInterval(ListUtils.java:157)
    at org.verapdf.wcag.algorithms.semanticalgorithms.utils.ListUtils.getChildrenListIntervals(ListUtils.java:130)
    at org.opendataloader.pdf.processors.ListProcessor.processListsFromTextNodes(ListProcessor.java:350)
    at org.opendataloader.pdf.processors.HybridDocumentProcessor.postProcess(HybridDocumentProcessor.java:422)
    at org.opendataloader.pdf.processors.HybridDocumentProcessor.processDocument(HybridDocumentProcessor.java:166)
    at org.opendataloader.pdf.processors.HybridDocumentProcessor.processDocument(HybridDocumentProcessor.java:78)
    at org.opendataloader.pdf.processors.DocumentProcessor.processFile(DocumentProcessor.java:73)
    at org.opendataloader.pdf.api.OpenDataLoaderPDF.processFile(OpenDataLoaderPDF.java:32)
    at org.opendataloader.pdf.cli.CLIMain.processFile(CLIMain.java:113)
    at org.opendataloader.pdf.cli.CLIMain.processPath(CLIMain.java:92)
    at org.opendataloader.pdf.cli.CLIMain.main(CLIMain.java:64)

The Java process logs SEVERE but exits with code 0, so the calling Python code receives no error. It only discovers the problem when no JSON output file exists. Reproducible on every document tested (50-page technical manuals with tables and lists). The triage routes ~24 pages to Java and ~26 to docling-fast.

...

### Steps to reproduce

1. Start the hybrid backend:
   opendataloader-pdf-hybrid --port 5002

2. Convert a PDF with hybrid mode:
   opendataloader-pdf --hybrid docling-fast --hybrid-url http://localhost:5002 --hybrid-timeout 60000 --hybrid-fallback --table-method cluster -f json -o output/ input.pdf

   Or via Python:
   import opendataloader_pdf
   opendataloader_pdf.convert(
       input_path="document.pdf",
       output_dir="output/",
       format="json",
       hybrid="docling-fast",
       hybrid_url="http://localhost:5002",
       hybrid_timeout="60000",
       hybrid_fallback=True,
       table_method="cluster",
   )

3. The docling-fast backend processes successfully (200 OK in logs).
4. Java crashes in postProcess with NullPointerException: no JSON output is produced.

...

### Version

1.10.1 (pip install opendataloader-pdf[hybrid])

...

### Java version

OpenJDK 11 (default-jre-headless, from pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime base image, Ubuntu 22.04)

...


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NullPointerException in ListUtils.isContainsHeading during hybrid mode postProcess #220

Bug

Steps to reproduce

Version

Java version

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NullPointerException in ListUtils.isContainsHeading during hybrid mode postProcess #220

Description

Bug

Steps to reproduce

Version

Java version

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions