Open
Description
Here you split words just around white spaces. You should use word boundaries instead (in regexp: \b, or something equivalent like common separators). Otherwise, the word is not detected in many contexts. For example, I have the word oster-monath
in the disallowed words file, but in a sentence it appears between quotation marks ("oster-monath"
) or near a comma (oster-monath,
) and it is not detected.
Case is very important for spelling in most languages. I think the disallowed words should be case-sensitive. Case-insensitive is used sometimes in NLP, but not in spell-checking! It could be optional for each language.
Activity