Skip to content

Conversation

@badGarnet
Copy link
Collaborator

@badGarnet badGarnet commented Jan 31, 2025

  • there is a bug in deciding if a page has tables before performing table extraction. This logic checks if the id associated with Table type element is True
  • however, it should be checking if the id is None because sometimes the id can be 0 (the first type of element in the page)
  • the fix updates the logic
  • adds a unit test for this specific case

- there is a bug in deciding if a page has tables before performing
  table extraction. This logic checks if the id assocaited with Table
  type element is True
- however, it should be checking if the id is `None` because sometimes
  the id can be 0 (the first type of element in the page)
- the fix updates the logic
- adds a unit test for this specific case
@badGarnet badGarnet requested a review from vangheem January 31, 2025 17:22
@badGarnet badGarnet marked this pull request as ready for review January 31, 2025 17:22

table_id = {v: k for k, v in elements.element_class_id_map.items()}.get(ElementType.TABLE)
if not table_id:
if table_id is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦

@badGarnet badGarnet enabled auto-merge January 31, 2025 17:47
@badGarnet badGarnet added this pull request to the merge queue Jan 31, 2025
@cragwolfe cragwolfe removed this pull request from the merge queue due to a manual request Jan 31, 2025
@cragwolfe cragwolfe merged commit 9d58b34 into main Jan 31, 2025
41 checks passed
@cragwolfe cragwolfe deleted the fix/fix-table-id-checking-logic branch January 31, 2025 18:19
temp-adelyn pushed a commit to temp-adelyn/unstructured that referenced this pull request Mar 3, 2025
- there is a bug in deciding if a page has tables before performing
table extraction. This logic checks if the id associated with Table type
element is True
- however, it should be checking if the id is `None` because sometimes
the id can be 0 (the first type of element in the page)
- the fix updates the logic
- adds a unit test for this specific case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants