feat: add ConversionStatus.TIMEOUT to differentiate from page failures by joaquinhuigomez · Pull Request #3211 · docling-project/docling

joaquinhuigomez · 2026-03-29T18:38:34Z

Add a dedicated ConversionStatus.TIMEOUT status so downstream consumers can distinguish between partial results caused by document_timeout being reached versus individual page conversion failures (which remain PARTIAL_SUCCESS). Both pipelines (threaded StandardPdfPipeline and legacy PaginatedPipeline) now emit TIMEOUT when the document timeout is exceeded. The DocumentConverter and DocumentExtractor treat TIMEOUT as non-fatal, consistent with existing PARTIAL_SUCCESS handling.

Fixes #3205

github-actions · 2026-03-29T18:38:45Z

✅ DCO Check Passed

Thanks @joaquinhuigomez, all your commits are properly signed off. 🎉

mergify · 2026-03-29T18:39:09Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

dosubot · 2026-03-29T18:40:00Z

Related Documentation

1 document(s) may need updating based on files changed in this PR:

Docling

What are the detailed pipeline options and processing behaviors for PDF, DOCX, PPTX, and XLSX files in the Python SDK?

View Suggested Changes

@@ -1,6 +1,7 @@
 ### PDF
 - **Pipeline/Backend**: `StandardPdfPipeline` + `DoclingParseDocumentBackend` (default: `docling_parse`)
 - **Key Options**:
+    - `document_timeout` (default: None): Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with `TIMEOUT` status. If None, no timeout is enforced. The `TIMEOUT` status is a dedicated status value that allows downstream consumers to distinguish between partial results caused by `document_timeout` being exceeded versus individual page conversion failures (which remain `PARTIAL_SUCCESS`). Both `StandardPdfPipeline` and `PaginatedPipeline` emit `TIMEOUT` when the document timeout is exceeded, and `DocumentConverter` treats `TIMEOUT` as non-fatal (similar to `PARTIAL_SUCCESS`).
     - `from_formats`: Supported input formats include `docx`, `pptx`, `html`, `image`, `pdf`, `asciidoc`, `md` (including `txt`, `text`, `qmd`, `rmd`), `csv`, `xlsx`, `xml_uspto`, `xml_jats`, `xml_xbrl`, `mets_gbs`, `json_docling`, `audio`, `vtt`, `latex`
     - `to_formats`: Supported output formats include `md`, `json`, `yaml`, `html`, `html_split_page`, `text`, `doctags`, `vtt`
     - `pdf_backend`: Allowed values: `pypdfium2`, `docling_parse`, `dlparse_v1`, `dlparse_v2`, `dlparse_v4` (default: `docling_parse`)

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

^{How did I do? Any feedback?}

codecov · 2026-03-30T06:02:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

joaquinhuigomez · 2026-03-31T11:28:39Z

Thanks for the review! Anything else needed to merge?

cau-git · 2026-03-31T11:34:01Z

@joaquinhuigomez Thanks for this contribution, I think it makes sense. Before we merge, we must however do a thorough scan on several client codes which have hard-coded expectations about "terminal states" in Docling. So far these all just watch for SUCCESS | PARTIAL_SUCCESS | FAILURE and not for TIMEOUT.

Add a dedicated TIMEOUT status so downstream consumers can distinguish between partial results caused by document_timeout being reached versus individual page conversion failures (which remain PARTIAL_SUCCESS). Fixes docling-project#3205 Signed-off-by: Joaquin Hui <joaquinhui1995@gmail.com> Signed-off-by: Joaquin Hui Gomez <132194176+joaquinhuigomez@users.noreply.github.com>

joaquinhuigomez · 2026-04-01T14:37:19Z

Makes sense — take your time on the scan. I've also fixed the DCO signoff.

PeterStaar-IBM requested review from cau-git and dolfim-ibm March 30, 2026 05:48

PeterStaar-IBM approved these changes Mar 31, 2026

View reviewed changes

joaquinhuigomez force-pushed the fix/timeout-vs-page-failure branch from 6a54bd5 to caa8799 Compare April 1, 2026 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ConversionStatus.TIMEOUT to differentiate from page failures#3211

feat: add ConversionStatus.TIMEOUT to differentiate from page failures#3211
joaquinhuigomez wants to merge 1 commit intodocling-project:mainfrom
joaquinhuigomez:fix/timeout-vs-page-failure

joaquinhuigomez commented Mar 29, 2026

Uh oh!

github-actions bot commented Mar 29, 2026 •

edited

Loading

Uh oh!

mergify bot commented Mar 29, 2026

Uh oh!

dosubot bot commented Mar 29, 2026

Uh oh!

codecov bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

joaquinhuigomez commented Mar 31, 2026

Uh oh!

cau-git commented Mar 31, 2026

Uh oh!

joaquinhuigomez commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

joaquinhuigomez commented Mar 29, 2026

Uh oh!

github-actions bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Mar 29, 2026

Merge Protections

🟢 Enforce conventional commit

Uh oh!

dosubot bot commented Mar 29, 2026

What are the detailed pipeline options and processing behaviors for PDF, DOCX, PPTX, and XLSX files in the Python SDK?

Uh oh!

codecov bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

joaquinhuigomez commented Mar 31, 2026

Uh oh!

cau-git commented Mar 31, 2026

Uh oh!

joaquinhuigomez commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 29, 2026 •

edited

Loading

codecov bot commented Mar 30, 2026 •

edited

Loading