Skip to content

TextBoxPdfTextStripper.NON_PRINTABLE omits 001A through 001F #43

@gmarmstrong

Description

@gmarmstrong

The IntRange currently used to define TextBoxPdfTextStripper.NON_PRINTABLE character codes may have a typo:

val NON_PRINTABLE:Pattern = Pattern.compile(".*[\\u0000-\\u0019]+.*")

That range should probably end in \\u001F instead of \\u0019. For comparison, later in the file, the entire 0x0000..0x001F range is checked.

Assuming the function is supposed to cover the entire range of C0 control codes, just replace 0019 with 001F. That change should be tested first against whichever documents originally inspired expandNonPrintableUnicode (coverage is missing for most of the function).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions