fix: disable numparse in tabulate to preserve trailing zeros#470
fix: disable numparse in tabulate to preserve trailing zeros#470ryyhan wants to merge 1 commit intodocling-project:mainfrom
Conversation
|
✅ DCO Check Passed Thanks @ryyhan, all your commits are properly signed off. 🎉 |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 Require two reviewer for test updatesThis rule is failing.When test data is updated, we require two reviewers
🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
05a346c to
2a0b348
Compare
cau-git
left a comment
There was a problem hiding this comment.
@ryyhan Thanks for attempting a fix at #350. We must however ensure it has no unwanted side effects in other cases than the ones observed in that issue. The original logic was for sure created with some intention.
The branch may also need an rebase to main, I see some unrelated stuff like the add_comment in the diff, which is already present on main.
There was a problem hiding this comment.
Now the try and except blocks call tabulate with exactly the same parameters, which makes no sense any more.
It looks like the original intention was to try using tabulate with number parsing, and disable it as fallback when a ValueError occurs.
There was a problem hiding this comment.
Hi @cau-git,
Thank you for the review.
-
On Redundancy: You are correct; my changes made the try...except block redundant as both paths now use disable_numparse=True. My intention was to strictly enforce the preservation of trailing zeros (Issue Issue with loss of trailing zeros in formatted numbers during PDF parsing #350), which tabulate strips by default.
-
On Side Effects: I have verified the side effects. Disabling number parsing preserves the exact text (e.g., "1.20") but changes the column alignment from right-aligned (numeric) to left-aligned (text).
Default: 1.2 (Right aligned, data loss)
Proposed: 1.20 (Left aligned, high fidelity)
I believe preserving the document's original text fidelity is more important than the heuristic alignment provided by tabulate. If you agree, I will proceed with this trade-off.
- Next Steps: I will rebase the branch to main to remove the unrelated add_comment changes. I will also simplify the code to remove the now-redundant try...except block, leaving just the single tabulate call with disable_numparse=True.
Does this approach sound good to you?
There was a problem hiding this comment.
This sounds reasonable to me. Let's see where it reflects in the test ground truth after reproducing that one. To do so please also run DOCLING_GEN_TEST_DATA=1 uv run pytest -s then commit the changes.
There was a problem hiding this comment.
Hi @cau-git,
Thanks again for the guidance.
I have:
- Rebased the branch on the latest
mainto clean up the history. - Simplified the implementation in markdown.py (removed the redundant
try...except). - Updated the test ground truth by running the requested command.
The tests are passing locally. I've pushed the changes.
Ready for another look!
Signed-off-by: ryyhan <dayel.rehan@gmail.com>
2a0b348 to
74c4b49
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Resolves #350
Fixes the loss of trailing zeros in Markdown table export by disabling automatic number parsing in
tabulate.Verified with reproduction script.