Skip to content

bug/en-dash-not-cleaned #4105

@carminoplata

Description

@carminoplata

Describe the bug
When you apply clean_bullets, the dash encoded as \u2013 (EN Dash) '–' is not removed. You have to add additional step clean_dashes in order to remove this kind of dashes which is still wrong since clean_dashes removes all EN Dash found into text field.

To Reproduce
Try to parse a pdf file where there is a list whose bullets point are identified by '–'

Expected behavior
The strings that starts with en-dash as bullets must be cleaned after called clean_bullets

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions