Skip to content

feat/pdf -> words being split across lines due to hyphenation #3486

Open
@ajpanyteam

Description

@ajpanyteam

Is your feature request related to a problem? Please describe.
When processing PDFs via by_title, a common issue are words being split across lines due to line breaks or hyphenation. Example, in the text string, I end up with 'powerful capabili- ties of' instead of "powerful capabilities of".

Describe the solution you'd like
Word to be merged if a line break is detected.

Describe alternatives you've considered
No alternative option exists.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions