Skip to content

feat: add ConversionStatus.TIMEOUT to differentiate from page failures#3211

Open
joaquinhuigomez wants to merge 1 commit intodocling-project:mainfrom
joaquinhuigomez:fix/timeout-vs-page-failure
Open

feat: add ConversionStatus.TIMEOUT to differentiate from page failures#3211
joaquinhuigomez wants to merge 1 commit intodocling-project:mainfrom
joaquinhuigomez:fix/timeout-vs-page-failure

Conversation

@joaquinhuigomez
Copy link
Copy Markdown
Contributor

Add a dedicated ConversionStatus.TIMEOUT status so downstream consumers can distinguish between partial results caused by document_timeout being reached versus individual page conversion failures (which remain PARTIAL_SUCCESS). Both pipelines (threaded StandardPdfPipeline and legacy PaginatedPipeline) now emit TIMEOUT when the document timeout is exceeded. The DocumentConverter and DocumentExtractor treat TIMEOUT as non-fatal, consistent with existing PARTIAL_SUCCESS handling.

Fixes #3205

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 29, 2026

DCO Check Passed

Thanks @joaquinhuigomez, all your commits are properly signed off. 🎉

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 29, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@dosubot
Copy link
Copy Markdown

dosubot bot commented Mar 29, 2026

Related Documentation

1 document(s) may need updating based on files changed in this PR:

Docling

What are the detailed pipeline options and processing behaviors for PDF, DOCX, PPTX, and XLSX files in the Python SDK?
View Suggested Changes
@@ -1,6 +1,7 @@
 ### PDF
 - **Pipeline/Backend**: `StandardPdfPipeline` + `DoclingParseDocumentBackend` (default: `docling_parse`)
 - **Key Options**:
+    - `document_timeout` (default: None): Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with `TIMEOUT` status. If None, no timeout is enforced. The `TIMEOUT` status is a dedicated status value that allows downstream consumers to distinguish between partial results caused by `document_timeout` being exceeded versus individual page conversion failures (which remain `PARTIAL_SUCCESS`). Both `StandardPdfPipeline` and `PaginatedPipeline` emit `TIMEOUT` when the document timeout is exceeded, and `DocumentConverter` treats `TIMEOUT` as non-fatal (similar to `PARTIAL_SUCCESS`).
     - `from_formats`: Supported input formats include `docx`, `pptx`, `html`, `image`, `pdf`, `asciidoc`, `md` (including `txt`, `text`, `qmd`, `rmd`), `csv`, `xlsx`, `xml_uspto`, `xml_jats`, `xml_xbrl`, `mets_gbs`, `json_docling`, `audio`, `vtt`, `latex`
     - `to_formats`: Supported output formats include `md`, `json`, `yaml`, `html`, `html_split_page`, `text`, `doctags`, `vtt`
     - `pdf_backend`: Allowed values: `pypdfium2`, `docling_parse`, `dlparse_v1`, `dlparse_v2`, `dlparse_v4` (default: `docling_parse`)

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

How did I do? Any feedback?  Join Discord

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@joaquinhuigomez
Copy link
Copy Markdown
Contributor Author

Thanks for the review! Anything else needed to merge?

@cau-git
Copy link
Copy Markdown
Member

cau-git commented Mar 31, 2026

@joaquinhuigomez Thanks for this contribution, I think it makes sense. Before we merge, we must however do a thorough scan on several client codes which have hard-coded expectations about "terminal states" in Docling. So far these all just watch for SUCCESS | PARTIAL_SUCCESS | FAILURE and not for TIMEOUT.

Add a dedicated TIMEOUT status so downstream consumers can distinguish
between partial results caused by document_timeout being reached versus
individual page conversion failures (which remain PARTIAL_SUCCESS).

Fixes docling-project#3205

Signed-off-by: Joaquin Hui <joaquinhui1995@gmail.com>
Signed-off-by: Joaquin Hui Gomez <132194176+joaquinhuigomez@users.noreply.github.com>
@joaquinhuigomez joaquinhuigomez force-pushed the fix/timeout-vs-page-failure branch from 6a54bd5 to caa8799 Compare April 1, 2026 14:37
@joaquinhuigomez
Copy link
Copy Markdown
Contributor Author

Makes sense — take your time on the scan. I've also fixed the DCO signoff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Differentiate between conversion whre document_timeout is reached versus those for which individual pages failed

3 participants