Handle ZWNBSP the same way as Word Joiner

wink-nlp version: 2.3.0
wink-eng-lite-web-model: 1.8.0

Currently the Unicode Zero-Width Non-Breaking Space character is only supposed to be used as a [Byte-Order Mark](https://en.wikipedia.org/wiki/Byte_order_mark), but it has previously had the same job as the [Word Joiner character](https://en.wikipedia.org/wiki/Word_joiner) and is still occasionally used that way, and Unicode recommends treating a ZWNBSP that is not at the start of the file the same way as a word joiner.

Currently, the old ZWNBSP character is not output in the token stream, similar to #135.  For example, I had a text with the date range `1830<U+FEFF>–<U+FEFF>1832`, and the output did not include the U+FEFF characters at all.  When I replace the deprecated U+FEFF characters with U+2060 Word Joiners, all characters are correctly reproduced in the output stream.

Note: I found this bug while debugging an issue in another project which uses Wink, and I donʼt know much about Wink myself.  I expect this is enough information to identify the issue, but if not then I might need extra help to provide more useful information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle ZWNBSP the same way as Word Joiner #145

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handle ZWNBSP the same way as Word Joiner #145

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions