If you use these data please cite
- the original source
Blum, Frederic; Barrientos, Carlos; Englisch, Johannes; Forkel, Robert; Gray, Russell D.; Greenhill, Simon J.; Rzymski, Christoph and List, Johann-Mattis (2025): Lexibank²: Precomputed Features for Large-Scale Lexical Data [Dataset, Version 2.0]. Leipzig: Max Planck Institute for Evolutionary Anthropology.
- the derived dataset using the DOI of the particular released version you were using
This dataset is licensed under a CC-BY-4.0 license
Available online at https://lexibank.clld.org
The core-sets are defined by using the following criteria:
- Varieties: 5,477 (linked to 3,107 different Glottocodes)
- Concepts: 3,205 (linked to 3,205 different Concepticon concept sets)
- Lexemes: 1,734,794
- Sources: 134
- Synonymy: 1.08
- Invalid lexemes: 0
- Tokens: 9,627,473
- Segments: 2,466 (0 BIPA errors, 0 CLTS sound class errors, 2457 CLTS modified)
- Inventory size (avg): 38.75
- Languages linked to bookkeeping languoids in Glottolog:
Name | GitHub user | Description | Role |
---|---|---|---|
Frederic Blum | @FredericBlum | maintainer | Author |
Carlos Barrientos | @MuffinLinwist | maintainer | Author |
Johannes Englisch | @johenglisch | maintainer | Author |
Robert Forkel | @xrotwang | maintainer | Author |
Russell D. Gray | maintainer | Author | |
Simon J. Greenhill | @simongreenhill | maintainer | Author |
Christoph Rzymski | @chrzyki | maintainer | Author |
Johann-Mattis List | @LinguList | maintainer | Author |
The following CLDF datasets are available in cldf: