A UD dataset for Sindhi, based on newswire (primarily Kawish), folk stories from the Adabi forums, handwritten text to demonstrate linguistic features, and a reparsing of the unfinished MazharDootio dataset.
Data in this treebank is split into three sections:
- Test section: some Kawish articles and folk stories. The reparsing of the MazharDootio dataset will also go here
- Dev section: another set of Kawish articles and folk stories
- Train section: everything else
Annotation done by:
- Mutee U Rahman
- Sarwat Qureshi
- Shafi Pirzada
- Sakeena Shah
- Muhammad Shaheer
- Mir Afzal Ahmed Talpur
- Zubair Sanjrani
- John Bauer
Publication out for review.
- 2024-05-15 v2.16
- Initial release in Universal Dependencies.
- test: xpos_features/sd_780_A.conllu, retagging/parsing of most of the MazharDootio dataset
- dev: xpos_features/sd_780_B.conllu
- train: xpos_standard/xpos_tagged_with_features.conllu (the handwritten sentences used to demonstrate xpos & features)
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.16 License: CC BY-SA 4.0 Includes text: yes Parallel: no Genre: grammar-examples Lemmas: manual native UPOS: manual native XPOS: manual native Features: manual native Relations: manual native Contributors: Rahman, Mutee-u Contributing: here Contact: [email protected] ===============================================================================