Lexibank Analysed

How to cite

If you use these data please cite

the original source

Blum, Frederic; Barrientos, Carlos; Englisch, Johannes; Forkel, Robert; Gray, Russell D.; Greenhill, Simon J.; Rzymski, Christoph and List, Johann-Mattis (2025): Lexibank²: Precomputed Features for Large-Scale Lexical Data [Dataset, Version 2.0]. Leipzig: Max Planck Institute for Evolutionary Anthropology.
the derived dataset using the DOI of the particular released version you were using

Description

This dataset is licensed under a CC-BY-4.0 license

Available online at https://lexibank.clld.org

Notes

Core Sets

The core-sets are defined by using the following criteria:

Statistics

Varieties: 5,477 (linked to 3,107 different Glottocodes)
Concepts: 3,205 (linked to 3,205 different Concepticon concept sets)
Lexemes: 1,734,794
Sources: 134
Synonymy: 1.08
Invalid lexemes: 0
Tokens: 9,627,473
Segments: 2,466 (0 BIPA errors, 0 CLTS sound class errors, 2457 CLTS modified)
Inventory size (avg): 38.75

Possible Improvements:

Languages linked to bookkeeping languoids in Glottolog:
- Taungtha (Wethet) rung1263
- Thaiphum (Rengkheng) thai1262
- Doitu (Hetsawlay) song1313
- Laitu (Khuasung) lait1239
- Laisaw Thu Htay Kung lait1239
- Songlai-Hettui 8Karchaung (Hettui) song1313
- Songlai-Maung Um (Song) 1Maung Um (Song) song1313
- Laitu Ahongdong lait1239
- Khalaj khal1270

Contributors

Name	GitHub user	Description	Role
Frederic Blum	@FredericBlum	maintainer	Author
Carlos Barrientos	@MuffinLinwist	maintainer	Author
Johannes Englisch	@johenglisch	maintainer	Author
Robert Forkel	@xrotwang	maintainer	Author
Russell D. Gray		maintainer	Author
Simon J. Greenhill	@simongreenhill	maintainer	Author
Christoph Rzymski	@chrzyki	maintainer	Author
Johann-Mattis List	@LinguList	maintainer	Author

CLDF Datasets

The following CLDF datasets are available in cldf:

CLDF Wordlist at cldf/wordlist-metadata.json
CLDF StructureDataset at cldf/phonology-metadata.json
CLDF StructureDataset at cldf/lexicon-metadata.json
CLDF StructureDataset at cldf/phonemes-metadata.json

Name		Name	Last commit message	Last commit date
Latest commit History 240 Commits
.github/workflows		.github/workflows
cldf		cldf
etc		etc
lexibank_analysed_commands		lexibank_analysed_commands
plots		plots
raw		raw
.gitignore		.gitignore
.zenodo.json		.zenodo.json
CONTRIBUTORS.md		CONTRIBUTORS.md
FORMS.md		FORMS.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
NOTES.md		NOTES.md
README.md		README.md
RELEASING.md		RELEASING.md
TRANSCRIPTION.md		TRANSCRIPTION.md
core_sets.svg		core_sets.svg
lexibank_lexibank_analysed.py		lexibank_lexibank_analysed.py
metadata.json		metadata.json
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
test.py		test.py
workflow.md		workflow.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lexibank Analysed

How to cite

Description

Notes

Core Sets

Statistics

Possible Improvements:

Contributors

CLDF Datasets

About

Releases 4

Packages

Contributors 7

Languages

License

lexibank/lexibank-analysed

Folders and files

Latest commit

History

Repository files navigation

Lexibank Analysed

How to cite

Description

Notes

Core Sets

Statistics

Possible Improvements:

Contributors

CLDF Datasets

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 7

Languages

Packages