You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 9, 2023. It is now read-only.
While working on the spaCy Japanese model support and integrating Sudachi, ran into the issue that the one-character ellipsis (…) was causing errors. If you tokenize this ellipsis you get three tokens from SudachiPy, with surfaces like ['', '', '…'].
I assume this is a bug but wasn't able to track down where it's happening. I also checked ㍻, and while that is also normalized internally it seems to be output as a single character without issue.