-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When you apply clean_bullets, the dash encoded as \u2013 (EN Dash) '–' is not removed. You have to add additional step clean_dashes in order to remove this kind of dashes which is still wrong since clean_dashes removes all EN Dash found into text field.
To Reproduce
Try to parse a pdf file where there is a list whose bullets point are identified by '–'
Expected behavior
The strings that starts with en-dash as bullets must be cleaned after called clean_bullets
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working