Lemma-vs-Form-Splits

This project contains an implementation of an LSTM + Attention Sequence-to-sequence model for Inflection, a well-known task in the domain of computational morphology, applied on manipulated datasets of SIGMORPHON 2020, task 0. The manipulation is simple -- instead of just prohibiting a sample (a lemma,form,features triplet) to appear both at the train, dev and test set, we prohibit samples with forms of the same lemma to appear on the different sets. This helps us to obtain a clearer picture of the model's generalization abilities.

The repo consists of the script that generates the lemma files and the network trained on the old and new datasets, implemented with PyTorch.

To generate the new datasets, clone the original SIGMORPHON data and and run generate_lemma_splits.py in the same folder.

To train the network, in Inflection_90_Langs.py select the list langauges to train on and run the script.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
New LSTMAttn Model		New LSTMAttn Model
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lemma-vs-Form-Splits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lemma-vs-Form-Splits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages