This method:
def _normalize(self, line:str):
return line.lower().replace('"', "'").replace("/", " ")
invokes .lower() on the input text, but that is a locale-sensitive operation.
The uppercase I (U+0049) converts i (U+0069) in all languages except Turkish and Azeri, where it should convert to dotless lowercase i (ı, U+0131).
So with the current code lowercase of "LARI" will be "lari", which does not exist in the Turkish n-gram, instead of "ları", which does exist.
This means that the recognition of Turkish and Azeri uppercase text will be problematic.