We currently wrap/monkey-patch uniseg's word_break() to handle PUA characters (such as Aleitha's special characters) properly. This can fail, e.g. #130.
A better approach would be to ask uniseg for some kind of supported way to have custom character property. So let's ask them!