fix: section after table incorrectly added as child of table header#3142
fix: section after table incorrectly added as child of table header#3142qianchongyang wants to merge 2 commits intodocling-project:mainfrom
Conversation
When a paragraph contains multiple oMath elements, previously they were concatenated into a single display block. Now each equation is processed separately and creates its own FORMULA item. Fixes docling-project#3121
… as child of table header When processing rich table cells (cells with multiple elements), the _walk_linear function was called to process cell content. This modified the self.parents dictionary, and these changes persisted after the table was processed. Subsequent elements (like section headers after the table) would then incorrectly become children of elements inside the table cell. This fix saves the parent state before processing rich table cell content and restores it afterward, similar to how textbox content is handled. Fixes: docling-project#2668
|
❌ DCO Check Failed Hi @qianchongyang, your pull request has failed the Developer Certificate of Origin (DCO) check. This repository supports remediation commits, so you can fix this without rewriting history — but you must follow the required message format. 🛠 Quick Fix: Add a remediation commitRun this command: git commit --allow-empty -s -m "DCO Remediation Commit for qianchongyang <qianchongyang>
I, qianchongyang <qianchongyang>, hereby add my Signed-off-by to this commit: 371cf4d6190d7b0f42128efea6a614cc13a7fb3e
I, qianchongyang <qianchongyang>, hereby add my Signed-off-by to this commit: 23472f15327d68207cd40403730968424ea74d2b"
git push🔧 Advanced: Sign off each commit directlyFor the latest commit: git commit --amend --signoff
git push --force-with-leaseFor multiple commits: git rebase --signoff origin/main
git push --force-with-leaseMore info: DCO check report |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
ceberam
left a comment
There was a problem hiding this comment.
Thanks @qianchongyang for your contribution!
Note that issue #2668 is already addressed by PR #3047
However, I see that you are trying to solve another issue from the docx backend parser. As mentioned in my other comment, could you please create a separate issue for that? If you like you could reuse this PR, changing its name and link it to the new issue.
Don't forget to add a new test or edit an existing one to cover your code changes.
| # Standalone equation(s) - create separate formula items for each equation | ||
| level = self._get_level() | ||
| t1 = doc.add_text( | ||
| label=DocItemLabel.FORMULA, | ||
| parent=self.parents[level - 1], | ||
| text=text.replace("<eq>", "").replace("</eq>", ""), | ||
| content_layer=self.content_layer, | ||
| ) | ||
| elem_ref.append(t1.get_ref()) | ||
| for eq in equations: | ||
| eq_text = eq.replace("<eq>", "").replace("</eq>", "").strip() | ||
| if eq_text: | ||
| t1 = doc.add_text( | ||
| label=DocItemLabel.FORMULA, | ||
| parent=self.parents[level - 1], | ||
| text=eq_text, | ||
| content_layer=self.content_layer, | ||
| ) | ||
| elem_ref.append(t1.get_ref()) |
There was a problem hiding this comment.
It looks like you are trying to solve another issue, different from the one you linked this PR to.
Could you please create a separate issue describing it?
Summary
Fix for Issue #2668: Section after table being incorrectly added as child of table header
Root Cause
When processing rich table cells (cells with multiple elements), the
_walk_linearfunction was called to process cell content. This modified theself.parentsdictionary, and these changes persisted after the table was processed. Subsequent elements (like section headers after the table) would then incorrectly become children of elements inside the table cell.Fix
Save the parent state before processing rich table cell content and restore it afterward, similar to how textbox content is handled in the same file (lines 829-832 and 878-879).
Changes
docling/backend/msword_backend.py: Added parent state save/restore around the_walk_linearcall for rich table cellsTesting
The fix has been verified to have correct syntax. The issue can be reproduced using the sample file from the issue.
Fixes #2668