-
Notifications
You must be signed in to change notification settings - Fork 2.9k
feat(layout,table): orientation-aware layout and table detection #1898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(layout,table): orientation-aware layout and table detection #1898
Conversation
✅ DCO Check Passed Thanks @ClemDoum, all your commits are properly signed off. 🎉 |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 Require two reviewer for test updatesThis rule is failing.When test data is updated, we require two reviewers
🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
703f218
to
8b4af6f
Compare
@ClemDoum Very nice follow up work, I was already looking forward to seeing this addressed! Questions on this PR:
|
4b9e8ce
to
7b4a445
Compare
@cau-git I've updated the e2e tests. Note that there was a bug in the I also deactivated the orientation test for To answer your points:
|
Test are finally green on my side 🥵 |
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! |
@ClemDoum Yes, what we need to test is if the same logic works universally for scanned docs (with tesseract defining the cell orientation) and with programmatic PDFs (using docling-parse v4, our default backend, which is the only backend that creates oriented cells). Hence we need a programmatic PDF with some pages that contain rotated text (e.g. table rotated to fill landscape, but page is defined as portrait), and run it through a test that has expectations on which page we see which orientation. Ideally, also with a comparison how the table structure is extracted without (current release) and with this change. Could you act on that? Many thanks! Finally, I am not certain if we need the rotation in the layout model, it was trained on various material and should have learned to capture the content independently of rotation. However, the reading order model would need that rotation to work well. |
Hi @cau-git, sorry for the late reply I was in vacation. Ok so if I get it right I need to
Concerning the layout model, I'm using rotation even if I guessed it was trained to be robust to it. I think e2e test will be more robust this way and also more easily debuggable (since after rotation each of the 90, 180, 270 test should yield exactly the same results as the base image). |
7b4a445
to
b7d4715
Compare
Oops I've closed this by mistake. I've just added the programmatic PDF, the output is not correct though. Using the debug mode I could find that it's because the layout detection detects the table as an image. It's probably because the layout detector hasn't been trained on enough page with rotated elemens, WDYT about @cau-git ? |
Here is the comparison between
|
3261524
to
3419c42
Compare
…ction Signed-off-by: Clément Doumouro <[email protected]>
Signed-off-by: Clément Doumouro <[email protected]>
Signed-off-by: Clément Doumouro <[email protected]>
Signed-off-by: Clément Doumouro <[email protected]>
Signed-off-by: Clément Doumouro <[email protected]>
Signed-off-by: Clément Doumouro <[email protected]>
Signed-off-by: Clément Doumouro <[email protected]>
…ble structure detection Signed-off-by: Clément Doumouro <[email protected]>
Signed-off-by: Clément Doumouro <[email protected]>
Signed-off-by: Clément Doumouro <[email protected]>
Signed-off-by: Clément Doumouro <[email protected]>
@ClemDoum was it intentional that you closed this PR? I would like to review this in the coming days. |
@cau-git I just wanted to rebase and update e2e tests after rebasing, I haven't closed it on purpose. |
Strangely neither can I reopen it right now... |
5f1979a
to
0fde27b
Compare
OK I managed to re-open it. e2e test don't look good though, on the non OCR part. I'm debugging. |
5c93dea
to
0f9e607
Compare
OK everything is looking good now @cau-git |
0f9e607
to
8a07bf6
Compare
Signed-off-by: Clément Doumouro <[email protected]>
8a07bf6
to
d81e50d
Compare
Maybe detect and rotate page in pdf backend would be easier? Hook "self._pdoc = pdfium.PdfDocument(self.path_or_stream)", iterate _pdoc and set_rotation for the pages, then save to a new io.BytesIO() and set it to path_or_stream. |
Description
While #1167 added the ability to include the
TextCell
orientation when detection, later stages of the pipelines are not using cells orientation.This PRs adds the support for orientation-aware layout and table detection, the logic is the same in both components:
Notes
rotate_bounding_box
function #1897. I've tested the full pipeline with that fix included and the e2e test look good. Once fix(ocr-utils): unit test and fix therotate_bounding_box
function #1897 is merge, e2e tests will have to be regeneratedChanges
Added
detect_orientation
function toocr-utils
to detect page orientation using majority voting from text cells orientationFixed
OcrMacModel
not taking into account theocr_rect
offset when calculationTextCell
coordinatesTODO:
rotate_bounding_box
function #1897Checklist: