Skip to content

Conversation

@maxmnemonic
Copy link
Contributor

@maxmnemonic maxmnemonic commented Nov 18, 2025

Added capability to load TableData.from_regions
Converts regions (lists of BoundingBox): rows, columns, merged cells into table_data structure,
Adds semantics for regions of row_headers, col_headers, row_section.
Tolerates slight overlaps "wobbly" bounding boxes, in some cases missing bounding boxes, etc.

Includes test_regions_to_table.py - that acts as an example

@maxmnemonic maxmnemonic self-assigned this Nov 18, 2025
@maxmnemonic maxmnemonic added the enhancement New feature or request label Nov 18, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 18, 2025

DCO Check Passed

Thanks @maxmnemonic, all your commits are properly signed off. 🎉

@dosubot
Copy link

dosubot bot commented Nov 18, 2025

Related Documentation

Checked 3 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@mergify
Copy link

mergify bot commented Nov 18, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

🟢 Require two reviewer for test updates

Wonderful, this rule succeeded.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

@PeterStaar-IBM
Copy link
Contributor

@maxmnemonic A few comments:

Direct Replacements

  • bbox_fraction_inside(inner, outer, eps=...): replace with inner.intersection_over_self(outer).
  • bbox_contains(inner, outer, threshold, eps=...): replace with inner.intersection_over_self(outer) >= threshold.
  • bbox_iou(a, b, eps=...): replace with a.intersection_over_union(b, eps=eps).
  • is_bbox_within(bbox_a, bbox_b, threshold=0.5): replace with bbox_b.intersection_over_self(bbox_a) >= threshold.
    • Note: this function’s parameter order is “is B within A”; keep call sites consistent.

Not Directly Replaceable

  • bbox_intersection(a, b) -> Optional[BoundingBox]: no BoundingBox method returns the intersection box itself, only areas/ratios.
    This helper is still needed.
  • dedupe_bboxes(...): no class method for deduplication; can be simplified internally by calling
    element.intersection_over_union(kept) directly, but not fully replaced.
  • _process_table_headers(...), compute_cells(...), regions_to_table(...): table/semantics glue — not replaceable by a single
    BoundingBox method.

Notes

  • All BoundingBox overlap/IoU methods assume matching coord_origin; same constraint applies today in regions.py.
  • intersection_over_self already handles zero-area safely (returns 0). The small eps used in bbox_fraction_inside is unnecessary
    when switching.

@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

❌ Patch coverage is 88.49558% with 13 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling_core/types/doc/document.py 92.70% 7 Missing ⚠️
docling_core/types/doc/base.py 64.70% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Collaborator

@vagenas vagenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments left inline.

Signed-off-by: Maksym Lysak <[email protected]>
@maxmnemonic maxmnemonic requested a review from vagenas November 18, 2025 14:33
Maksym Lysak and others added 2 commits November 18, 2025 16:01
Copy link
Collaborator

@vagenas vagenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference: document.py modularization to be addressed in #433

@maxmnemonic maxmnemonic merged commit c80b583 into main Nov 19, 2025
13 checks passed
@maxmnemonic maxmnemonic deleted the dev/table_regions branch November 19, 2025 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants