This directory contains additional Python scripts related to the UDTube project.
Some of these scripts have dependencies beyond what UDTube requires, which are
listed in requirements.txt. Before using these scripts,
install UDTube and then run (in this directory):
pip install -r requirements.txt
The following scripts are provided:
convert_to_um.pyconverts the FEATS column of a CoNLL-U format file from Universal Dependencies format to UniMorph format usingud-compatibility. The user may wish to do this conversion to training and validation files before training. Note that this may not work with all languages in the Universal Dependencies corpus.evaluate.pyprovides a general tool for evaluating labeled CoNLL-U data; it reports accuracy for language-universal and language-specific POS tags, lemmatization, and morphological features.remove_mwe.pyremoves multi-word annotations from CoNLL-U format files. In practice, the subwords of a multi-word expression are usually annotated separately, so this simply cleans up things.pretokenize.pyconverts raw text into CoNLL-U format usingspacy-udpipe.udpipe.pyapplies pretrained UDPipe models usingspacy-udpipe. This is a lower-resource alternative to the UDTube pipeline and can be useful for comparison.