Skip to content

fix: disable numparse in tabulate to preserve trailing zeros#470

Open
ryyhan wants to merge 1 commit intodocling-project:mainfrom
ryyhan:fix/issue-350-trailing-zeros
Open

fix: disable numparse in tabulate to preserve trailing zeros#470
ryyhan wants to merge 1 commit intodocling-project:mainfrom
ryyhan:fix/issue-350-trailing-zeros

Conversation

@ryyhan
Copy link

@ryyhan ryyhan commented Jan 8, 2026

Resolves #350

Fixes the loss of trailing zeros in Markdown table export by disabling automatic number parsing in tabulate.

Verified with reproduction script.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 8, 2026

DCO Check Passed

Thanks @ryyhan, all your commits are properly signed off. 🎉

@mergify
Copy link

mergify bot commented Jan 8, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviewer for test updates

This rule is failing.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@dosubot
Copy link

dosubot bot commented Jan 8, 2026

Related Documentation

Checked 6 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@ryyhan ryyhan force-pushed the fix/issue-350-trailing-zeros branch from 05a346c to 2a0b348 Compare January 8, 2026 19:31
Copy link
Member

@cau-git cau-git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryyhan Thanks for attempting a fix at #350. We must however ensure it has no unwanted side effects in other cases than the ones observed in that issue. The original logic was for sure created with some intention.

The branch may also need an rebase to main, I see some unrelated stuff like the add_comment in the diff, which is already present on main.

Comment on lines 420 to 432
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the try and except blocks call tabulate with exactly the same parameters, which makes no sense any more.
It looks like the original intention was to try using tabulate with number parsing, and disable it as fallback when a ValueError occurs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cau-git,

Thank you for the review.

  1. On Redundancy: You are correct; my changes made the try...except block redundant as both paths now use disable_numparse=True. My intention was to strictly enforce the preservation of trailing zeros (Issue Issue with loss of trailing zeros in formatted numbers during PDF parsing #350), which tabulate strips by default.

  2. On Side Effects: I have verified the side effects. Disabling number parsing preserves the exact text (e.g., "1.20") but changes the column alignment from right-aligned (numeric) to left-aligned (text).

Default: 1.2 (Right aligned, data loss)
Proposed: 1.20 (Left aligned, high fidelity)
I believe preserving the document's original text fidelity is more important than the heuristic alignment provided by tabulate. If you agree, I will proceed with this trade-off.

  1. Next Steps: I will rebase the branch to main to remove the unrelated add_comment changes. I will also simplify the code to remove the now-redundant try...except block, leaving just the single tabulate call with disable_numparse=True.

Does this approach sound good to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds reasonable to me. Let's see where it reflects in the test ground truth after reproducing that one. To do so please also run DOCLING_GEN_TEST_DATA=1 uv run pytest -s then commit the changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cau-git,

Thanks again for the guidance.

I have:

  1. Rebased the branch on the latest main to clean up the history.
  2. Simplified the implementation in markdown.py (removed the redundant try...except).
  3. Updated the test ground truth by running the requested command.

The tests are passing locally. I've pushed the changes.
Ready for another look!

Signed-off-by: ryyhan <dayel.rehan@gmail.com>
@ryyhan ryyhan force-pushed the fix/issue-350-trailing-zeros branch from 2a0b348 to 74c4b49 Compare January 16, 2026 19:40
@codecov
Copy link

codecov bot commented Jan 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue with loss of trailing zeros in formatted numbers during PDF parsing

2 participants