Skip to content

Conversation

ClemDoum
Copy link
Contributor

@ClemDoum ClemDoum commented Jul 4, 2025

Description

While #1167 added the ability to include the TextCell orientation when detection, later stages of the pipelines are not using cells orientation.

This PRs adds the support for orientation-aware layout and table detection, the logic is the same in both components:

  1. detect the page orientation using majority voting from the page's cell
  2. if the page is not properly oriented, rotate the component's inputs
  3. rotate the component's results back the original orientation before saving them in the result

Notes

Changes

Added

  • Added support for orientation-aware layout detection and table detection
  • Added a detect_orientation function to ocr-utils to detect page orientation using majority voting from text cells orientation

Fixed

  • a bug in the OcrMacModel not taking into account the ocr_rect offset when calculation TextCell coordinates

TODO:

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Copy link
Contributor

github-actions bot commented Jul 4, 2025

DCO Check Passed

Thanks @ClemDoum, all your commits are properly signed off. 🎉

Copy link

mergify bot commented Jul 4, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviewer for test updates

This rule is failing.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@ClemDoum ClemDoum marked this pull request as ready for review July 4, 2025 08:43
@ClemDoum ClemDoum force-pushed the fix/table-detection-orientation branch from 703f218 to 8b4af6f Compare July 7, 2025 13:47
@cau-git cau-git changed the title fix(layout,table): orientation-aware layout and table detection feat(layout,table): orientation-aware layout and table detection Jul 8, 2025
@cau-git
Copy link
Contributor

cau-git commented Jul 8, 2025

@ClemDoum Very nice follow up work, I was already looking forward to seeing this addressed!
I will merge #1897 once checks have passed so you can update the test ground-truth with this in.

Questions on this PR:

  • Did you test it with specific PDF backends: docling-parse-v4, pypdfium2 ? (both should work)
  • Do you have an estimate of the performance penalty this brings when rotation needs to be applied?
  • For the LayoutModel I find only changes to the debug visualization function, but the layout detection does not receive rotated input. Are you planning to deal with that too?

@ClemDoum ClemDoum force-pushed the fix/table-detection-orientation branch 2 times, most recently from 4b9e8ce to 7b4a445 Compare July 9, 2025 15:06
@ClemDoum
Copy link
Contributor Author

ClemDoum commented Jul 9, 2025

@cau-git I've updated the e2e tests. Note that there was a bug in the OcrMacModel which was not taking the ocr_rect offset into account when computing the coordinates of the TextCell.

I also deactivated the orientation test for OcrMacModel. While the macOS OCR handles text which is not in the right orientation the API doesn't provide the detected orientation which makes it impossible to use in the downstream components.

To answer your points:

  • I haven't tried the other backends, should I try them locally or create a test ?
  • on my M1 laptop the rotation of a page in debug mode takes about 0.2ms so I expect the impact to be neglectable for a whole doc
  • for the layout model I rotate the documents here and rotate the bbox back to original orientation here

@ClemDoum
Copy link
Contributor Author

ClemDoum commented Jul 9, 2025

Test are finally green on my side 🥵

Copy link

codecov bot commented Jul 9, 2025

Codecov Report

Attention: Patch coverage is 90.47619% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling/models/ocr_mac_model.py 0.00% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

@cau-git
Copy link
Contributor

cau-git commented Jul 10, 2025

I haven't tried the other backends, should I try them locally or create a test ?

@ClemDoum Yes, what we need to test is if the same logic works universally for scanned docs (with tesseract defining the cell orientation) and with programmatic PDFs (using docling-parse v4, our default backend, which is the only backend that creates oriented cells). Hence we need a programmatic PDF with some pages that contain rotated text (e.g. table rotated to fill landscape, but page is defined as portrait), and run it through a test that has expectations on which page we see which orientation. Ideally, also with a comparison how the table structure is extracted without (current release) and with this change. Could you act on that? Many thanks!

Finally, I am not certain if we need the rotation in the layout model, it was trained on various material and should have learned to capture the content independently of rotation. However, the reading order model would need that rotation to work well.

@cau-git cau-git self-assigned this Jul 14, 2025
@ClemDoum
Copy link
Contributor Author

ClemDoum commented Jul 30, 2025

Hi @cau-git, sorry for the late reply I was in vacation.

Ok so if I get it right I need to

  • switch to the default backend for OCR e2e test
  • enrich e2e test with a programmatic PDF with a rotated table inside
  • produce outputs before and after the PR (I'll share results here)

Concerning the layout model, I'm using rotation even if I guessed it was trained to be robust to it. I think e2e test will be more robust this way and also more easily debuggable (since after rotation each of the 90, 180, 270 test should yield exactly the same results as the base image).

@ClemDoum ClemDoum force-pushed the fix/table-detection-orientation branch from 7b4a445 to b7d4715 Compare July 30, 2025 17:17
@ClemDoum ClemDoum closed this Jul 31, 2025
@ClemDoum ClemDoum reopened this Jul 31, 2025
@ClemDoum
Copy link
Contributor Author

Oops I've closed this by mistake.

I've just added the programmatic PDF, the output is not correct though. Using the debug mode I could find that it's because the layout detection detects the table as an image.

It's probably because the layout detector hasn't been trained on enough page with rotated elemens, WDYT about @cau-git ?

@ClemDoum
Copy link
Contributor Author

ClemDoum commented Jul 31, 2025

Here is the comparison between main and this branch for the following PDF with a table image rotated 270° inside it.

main

OCR

ocr_page_00000

Post-processed layout

postprocessed_layout_page_00000

Table structure

table_struct_page_00000

Output as markdown

| value   | value       | value             | value   | Vertically merged   |
|---------|-------------|-------------------|---------|---------------------|
| value   | Some other  | Some other value  | column  | Other merged        |
| value   | Yet another | Yet another value | column  | Yet another         |

This branch

OCR

ocr_page_00000

Post-processed layout

postprocessed_layout_page_00000

Table structure

table_struct_page_00000

Output as markdown

| Vertically merged   | Other merged column   | Yet another column   |
|---------------------|-----------------------|----------------------|
| value               | Some other value      | Yet another value    |
| value               | Some other value      | Yet another value    |

@ClemDoum ClemDoum closed this Sep 3, 2025
@ClemDoum ClemDoum force-pushed the fix/table-detection-orientation branch from 3261524 to 3419c42 Compare September 3, 2025 13:44
@cau-git
Copy link
Contributor

cau-git commented Sep 3, 2025

@ClemDoum was it intentional that you closed this PR? I would like to review this in the coming days.

@ClemDoum
Copy link
Contributor Author

ClemDoum commented Sep 3, 2025

@cau-git I just wanted to rebase and update e2e tests after rebasing, I haven't closed it on purpose.
I can't reopen it though

@cau-git
Copy link
Contributor

cau-git commented Sep 3, 2025

Strangely neither can I reopen it right now...

@ClemDoum ClemDoum reopened this Sep 3, 2025
@ClemDoum ClemDoum force-pushed the fix/table-detection-orientation branch from 5f1979a to 0fde27b Compare September 3, 2025 16:09
@ClemDoum
Copy link
Contributor Author

ClemDoum commented Sep 3, 2025

OK I managed to re-open it. e2e test don't look good though, on the non OCR part. I'm debugging.

@ClemDoum ClemDoum force-pushed the fix/table-detection-orientation branch 2 times, most recently from 5c93dea to 0f9e607 Compare September 3, 2025 16:44
@ClemDoum
Copy link
Contributor Author

ClemDoum commented Sep 3, 2025

OK everything is looking good now @cau-git

@ClemDoum ClemDoum force-pushed the fix/table-detection-orientation branch from 0f9e607 to 8a07bf6 Compare September 3, 2025 16:49
@ClemDoum ClemDoum force-pushed the fix/table-detection-orientation branch from 8a07bf6 to d81e50d Compare September 8, 2025 13:15
@vonjackustc
Copy link

vonjackustc commented Sep 25, 2025

Maybe detect and rotate page in pdf backend would be easier? Hook "self._pdoc = pdfium.PdfDocument(self.path_or_stream)", iterate _pdoc and set_rotation for the pages, then save to a new io.BytesIO() and set it to path_or_stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants