-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Description
The IntRange currently used to define TextBoxPdfTextStripper.NON_PRINTABLE character codes may have a typo:
textricator/src/main/java/io/mfj/textricator/extractor/pdfbox/TextBoxPdfTextStripper.kt
Line 31 in c800c66
| val NON_PRINTABLE:Pattern = Pattern.compile(".*[\\u0000-\\u0019]+.*") |
That range should probably end in \\u001F instead of \\u0019. For comparison, later in the file, the entire 0x0000..0x001F range is checked.
Assuming the function is supposed to cover the entire range of C0 control codes, just replace 0019 with 001F. That change should be tested first against whichever documents originally inspired expandNonPrintableUnicode (coverage is missing for most of the function).
Metadata
Metadata
Assignees
Labels
No labels