feat(layout,table): orientation-aware layout and table detection #1898

ClemDoum · 2025-07-04T08:36:20Z

Description

While #1167 added the ability to include the TextCell orientation when detection, later stages of the pipelines are not using cells orientation.

This PRs adds the support for orientation-aware layout and table detection, the logic is the same in both components:

detect the page orientation using majority voting from the page's cell
if the page is not properly oriented, rotate the component's inputs
rotate the component's results back the original orientation before saving them in the result

Notes

⚠️ current outputs of the full pipeline are not correct because the bug fixed by fix(ocr-utils): unit test and fix the rotate_bounding_box function #1897. I've tested the full pipeline with that fix included and the e2e test look good. Once fix(ocr-utils): unit test and fix the rotate_bounding_box function #1897 is merge, e2e tests will have to be regenerated

Changes

Added

Added support for orientation-aware layout detection and table detection
Added a detect_orientation function to ocr-utils to detect page orientation using majority voting from text cells orientation

Fixed

a bug in the OcrMacModel not taking into account the ocr_rect offset when calculation TextCell coordinates

TODO:

merge fix(ocr-utils): unit test and fix the rotate_bounding_box function #1897
update e2e tests

Checklist:

Documentation has been updated, if necessary.
Examples have been added, if necessary.
Tests have been added, if necessary.

github-actions · 2025-07-04T08:36:30Z

✅ DCO Check Passed

Thanks @ClemDoum, all your commits are properly signed off. 🎉

mergify · 2025-07-04T08:36:54Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviewer for test updates

This rule is failing.

When test data is updated, we require two reviewers

#approved-reviews-by >= 2

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

cau-git · 2025-07-08T14:57:28Z

@ClemDoum Very nice follow up work, I was already looking forward to seeing this addressed!
I will merge #1897 once checks have passed so you can update the test ground-truth with this in.

Questions on this PR:

Did you test it with specific PDF backends: docling-parse-v4, pypdfium2 ? (both should work)
Do you have an estimate of the performance penalty this brings when rotation needs to be applied?
For the LayoutModel I find only changes to the debug visualization function, but the layout detection does not receive rotated input. Are you planning to deal with that too?

ClemDoum · 2025-07-09T15:14:39Z

@cau-git I've updated the e2e tests. Note that there was a bug in the OcrMacModel which was not taking the ocr_rect offset into account when computing the coordinates of the TextCell.

I also deactivated the orientation test for OcrMacModel. While the macOS OCR handles text which is not in the right orientation the API doesn't provide the detected orientation which makes it impossible to use in the downstream components.

To answer your points:

I haven't tried the other backends, should I try them locally or create a test ?
on my M1 laptop the rotation of a page in debug mode takes about 0.2ms so I expect the impact to be neglectable for a whole doc
for the layout model I rotate the documents here and rotate the bbox back to original orientation here

ClemDoum · 2025-07-09T15:22:49Z

Test are finally green on my side 🥵

codecov · 2025-07-09T16:17:56Z

Codecov Report

Attention: Patch coverage is 90.47619% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
docling/models/ocr_mac_model.py	0.00%	4 Missing ⚠️

📢 Thoughts on this report? Let us know!

cau-git · 2025-07-10T14:37:13Z

I haven't tried the other backends, should I try them locally or create a test ?

@ClemDoum Yes, what we need to test is if the same logic works universally for scanned docs (with tesseract defining the cell orientation) and with programmatic PDFs (using docling-parse v4, our default backend, which is the only backend that creates oriented cells). Hence we need a programmatic PDF with some pages that contain rotated text (e.g. table rotated to fill landscape, but page is defined as portrait), and run it through a test that has expectations on which page we see which orientation. Ideally, also with a comparison how the table structure is extracted without (current release) and with this change. Could you act on that? Many thanks!

Finally, I am not certain if we need the rotation in the layout model, it was trained on various material and should have learned to capture the content independently of rotation. However, the reading order model would need that rotation to work well.

ClemDoum · 2025-07-30T08:36:12Z

Hi @cau-git, sorry for the late reply I was in vacation.

Ok so if I get it right I need to

switch to the default backend for OCR e2e test
enrich e2e test with a programmatic PDF with a rotated table inside
produce outputs before and after the PR (I'll share results here)

Concerning the layout model, I'm using rotation even if I guessed it was trained to be robust to it. I think e2e test will be more robust this way and also more easily debuggable (since after rotation each of the 90, 180, 270 test should yield exactly the same results as the base image).

ClemDoum · 2025-07-31T10:27:41Z

Oops I've closed this by mistake.

I've just added the programmatic PDF, the output is not correct though. Using the debug mode I could find that it's because the layout detection detects the table as an image.

It's probably because the layout detector hasn't been trained on enough page with rotated elemens, WDYT about @cau-git ?

ClemDoum · 2025-07-31T13:20:46Z

Here is the comparison between main and this branch for the following PDF with a table image rotated 270° inside it.

`main`

OCR

Post-processed layout

Table structure

Output as markdown

| value   | value       | value             | value   | Vertically merged   |
|---------|-------------|-------------------|---------|---------------------|
| value   | Some other  | Some other value  | column  | Other merged        |
| value   | Yet another | Yet another value | column  | Yet another         |

This branch

OCR

Post-processed layout

Table structure

Output as markdown

| Vertically merged   | Other merged column   | Yet another column   |
|---------------------|-----------------------|----------------------|
| value               | Some other value      | Yet another value    |
| value               | Some other value      | Yet another value    |

…ction Signed-off-by: Clément Doumouro <[email protected]>

Signed-off-by: Clément Doumouro <[email protected]>

…ble structure detection Signed-off-by: Clément Doumouro <[email protected]>

Signed-off-by: Clément Doumouro <[email protected]>

cau-git · 2025-09-03T15:15:32Z

@ClemDoum was it intentional that you closed this PR? I would like to review this in the coming days.

ClemDoum · 2025-09-03T15:53:44Z

@cau-git I just wanted to rebase and update e2e tests after rebasing, I haven't closed it on purpose.
I can't reopen it though

cau-git · 2025-09-03T15:59:00Z

Strangely neither can I reopen it right now...

ClemDoum · 2025-09-03T16:11:10Z

OK I managed to re-open it. e2e test don't look good though, on the non OCR part. I'm debugging.

ClemDoum · 2025-09-03T16:45:58Z

OK everything is looking good now @cau-git

Signed-off-by: Clément Doumouro <[email protected]>

vonjackustc · 2025-09-25T01:16:56Z

Maybe detect and rotate page in pdf backend would be easier? Hook "self._pdoc = pdfium.PdfDocument(self.path_or_stream)", iterate _pdoc and set_rotation for the pages, then save to a new io.BytesIO() and set it to path_or_stream.

ClemDoum marked this pull request as ready for review July 4, 2025 08:43

PeterStaar-IBM requested review from PeterStaar-IBM, cau-git and dolfim-ibm July 4, 2025 12:59

PeterStaar-IBM assigned ClemDoum Jul 4, 2025

ClemDoum force-pushed the fix/table-detection-orientation branch from 703f218 to 8b4af6f Compare July 7, 2025 13:47

cau-git changed the title ~~fix(layout,table): orientation-aware layout and table detection~~ feat(layout,table): orientation-aware layout and table detection Jul 8, 2025

ClemDoum force-pushed the fix/table-detection-orientation branch 2 times, most recently from 4b9e8ce to 7b4a445 Compare July 9, 2025 15:06

cau-git self-assigned this Jul 14, 2025

ClemDoum force-pushed the fix/table-detection-orientation branch from 7b4a445 to b7d4715 Compare July 30, 2025 17:17

ClemDoum closed this Jul 31, 2025

ClemDoum reopened this Jul 31, 2025

ClemDoum closed this Sep 3, 2025

ClemDoum force-pushed the fix/table-detection-orientation branch from 3261524 to 3419c42 Compare September 3, 2025 13:44

ClemDoum added 5 commits September 3, 2025 15:50

fix(ocr): rotate image to the natural orientation before layout predi…

a1e4598

…ction Signed-off-by: Clément Doumouro <[email protected]>

fix(ocr): fix layout debug

309b084

Signed-off-by: Clément Doumouro <[email protected]>

fix(ocr): move bounding bow rotation util to orientation.py

62fc04b

Signed-off-by: Clément Doumouro <[email protected]>

fix(ocr): refactor rotation utilities

47814f0

Signed-off-by: Clément Doumouro <[email protected]>

fix(layout,table): orientation-aware layout and table detection

9ae5be6

Signed-off-by: Clément Doumouro <[email protected]>

ClemDoum added 6 commits September 3, 2025 15:53

fix(layout,table): orientation-aware layout and table detection

7ef4f8f

Signed-off-by: Clément Doumouro <[email protected]>

fix(layout,table): update e2e test

4e9c4b4

Signed-off-by: Clément Doumouro <[email protected]>

fix(layout,table): use default v4 backend for e2e OCR test and fix ta…

f9da84b

…ble structure detection Signed-off-by: Clément Doumouro <[email protected]>

fix(layout,table): add a programmatic PDF with a rotated table inside

e85087b

Signed-off-by: Clément Doumouro <[email protected]>

fix(layout,table): add a programmatic PDF with a rotated table inside

83b3e46

Signed-off-by: Clément Doumouro <[email protected]>

fix(layout,table): perform orientation detection at the table level

7183dc6

Signed-off-by: Clément Doumouro <[email protected]>

ClemDoum reopened this Sep 3, 2025

ClemDoum force-pushed the fix/table-detection-orientation branch from 5f1979a to 0fde27b Compare September 3, 2025 16:09

ClemDoum force-pushed the fix/table-detection-orientation branch 2 times, most recently from 5c93dea to 0f9e607 Compare September 3, 2025 16:44

ClemDoum force-pushed the fix/table-detection-orientation branch from 0f9e607 to 8a07bf6 Compare September 3, 2025 16:49

fix(layout,table): perform orientation detection at the table level

d81e50d

Signed-off-by: Clément Doumouro <[email protected]>

ClemDoum force-pushed the fix/table-detection-orientation branch from 8a07bf6 to d81e50d Compare September 8, 2025 13:15

feat(layout,table): orientation-aware layout and table detection #1898

Are you sure you want to change the base?

feat(layout,table): orientation-aware layout and table detection #1898

Uh oh!

Conversation

ClemDoum commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Notes

Changes

Added

Fixed

TODO:

Uh oh!

github-actions bot commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🔴 Require two reviewer for test updates

🟢 Enforce conventional commit

Uh oh!

cau-git commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ClemDoum commented Jul 9, 2025

Uh oh!

ClemDoum commented Jul 9, 2025

Uh oh!

codecov bot commented Jul 9, 2025

Codecov Report

Uh oh!

cau-git commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ClemDoum commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ClemDoum commented Jul 31, 2025

Uh oh!

ClemDoum commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

main

OCR

Post-processed layout

Table structure

Output as markdown

This branch

OCR

Post-processed layout

Table structure

Output as markdown

Uh oh!

cau-git commented Sep 3, 2025

Uh oh!

ClemDoum commented Sep 3, 2025

Uh oh!

cau-git commented Sep 3, 2025

Uh oh!

ClemDoum commented Sep 3, 2025

Uh oh!

ClemDoum commented Sep 3, 2025

Uh oh!

vonjackustc commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ClemDoum commented Jul 4, 2025 •

edited

Loading

github-actions bot commented Jul 4, 2025 •

edited

Loading

mergify bot commented Jul 4, 2025 •

edited

Loading

cau-git commented Jul 8, 2025 •

edited

Loading

cau-git commented Jul 10, 2025 •

edited

Loading

ClemDoum commented Jul 30, 2025 •

edited

Loading

ClemDoum commented Jul 31, 2025 •

edited

Loading

`main`

vonjackustc commented Sep 25, 2025 •

edited

Loading