Description
wink-nlp version: 2.3.0
wink-eng-lite-web-model: 1.8.0
Currently the Unicode Zero-Width Non-Breaking Space character is only supposed to be used as a Byte-Order Mark, but it has previously had the same job as the Word Joiner character and is still occasionally used that way, and Unicode recommends treating a ZWNBSP that is not at the start of the file the same way as a word joiner.
Currently, the old ZWNBSP character is not output in the token stream, similar to #135. For example, I had a text with the date range 1830<U+FEFF>–<U+FEFF>1832
, and the output did not include the U+FEFF characters at all. When I replace the deprecated U+FEFF characters with U+2060 Word Joiners, all characters are correctly reproduced in the output stream.
Note: I found this bug while debugging an issue in another project which uses Wink, and I donʼt know much about Wink myself. I expect this is enough information to identify the issue, but if not then I might need extra help to provide more useful information.