Skip to content

Conversation

@markojaadam
Copy link

In case if we pass multiple areas it's a very reasonable use case to be able to pass column separators for each area. Tabula-java supports it natively and it works with submitting repeated column lists as java-native options. Additionally, it matches the order with the order of the areas.
It would be nice to have this feature wrapped into tabula-py.

In case if we pass multiple areas it's a very reasonable use case to be able to pass column separators for each area. Tabula-java supports it natively and it works with submitting repeated column lists as java-native options. Additionally, it matches the order with the order of the areas.
It would be nice to have this feature wrapped into tabula-py.
@markojaadam markojaadam changed the title Allow to pass multiple list of column coordinates Allow to pass multiple lists of column coordinates Jun 6, 2025
Copy link
Owner

@chezou chezou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

Overall looks good. Can you add unit tests for it?

password: Optional[str] = None
silent: Optional[bool] = None
columns: Optional[Sequence[float]] = None
columns: Optional[Union[Sequence[float], Sequence[Sequence[float]]]] = None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Co-authored-by: Aki Ariga <[email protected]>
@markojaadam
Copy link
Author

@chezou I was doing a bit deeper research. What was misleading me when I initially tested this mode of working that I was testing it with 2 areas with the same number but different width of columns and it worked perfectly, it parsed both tables correctly and I was happy. However, after I started to generate multiple test PDFs and ran my PR against them I noticed that it forces the same column number on all of them with producing filler columns with empty text and all-zero-coordinates. After some additional test I've confirmed that means despite the java engine allows repeated column arguments, it actually ignores the subsequent set of coordinates and falls back to auto-detecting column separators. Also, after some search I've found this: tabulapdf/tabula-java#401. So sadly they officially don't support this, it was just unclean how does it work when we have multiple areas defined along with passing separator coordinates with --column. Sadly, for this reason I'm slocing my PR because this isn't an existing tabula-java feature.

@markojaadam markojaadam closed this Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants