Langram - the most accurate language detection library

321 ScriptLanguages (187 models + 134 single language scripts)

Usage examples in docs.rs.

One language can be written in multiple scripts, so it will be detected as a different ScriptLanguage (language + script)

Uses alphabet_detector as a word separator + language prefilter.

Based on chars (1 - 5) and 1 word n-gram language model modified algorithm.

RAM requirements are low, but it may take up to the provided models binary file's size, but this memory is shared (Virtual space, Mmap), so it's not required to have that amount of RAM available. But if it won't be able to cache the whole models file in RAM, it's speed will be affected.

This library is a complete rewrite of Lingua: much faster, more accuracy, more languages, etc.

Also more accurate than Whatlang or Whichlang. More info at the Comparison with other language detectors.

To better understand the accuracy of different modes, look into the Accuracy report.

Setup

To use this library, you need a binary models file, which must be placed near the executable, or set LANGRAM_MODELS_PATH.

It can be:

Downloaded from langram_models releases;
Built (recommened if big-endian target) langram_models. Which is more advanced and allows you to remove model ngrams, and recompile, so that models binary would be lighter.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
bench		bench
src		src
tests		tests
train		train
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

Langram - the most accurate language detection library

321 ScriptLanguages (187 models + 134 single language scripts)

Usage examples in docs.rs.

Setup

About

Licenses found

Uh oh!

Uh oh!

Languages

License

Licenses found

RoDmitry/langram

Folders and files

Latest commit

History

Repository files navigation

Langram - the most accurate language detection library

321 ScriptLanguages (187 models + 134 single language scripts)

Usage examples in docs.rs.

Setup

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages