Skip to content

Files

Failed to load latest commit information.

Latest commit

 Cannot retrieve latest commit at this time.

History

History

bootstrap

how to bootstrap a new concrete WordNet grammar

If your target language has a Translate grammar but no WordNet, use the migrate.hs script to bootstrap it, and you are done. If your language has both, it is better to bootstrap the new concrete using WordNet. Else keep on reading.

Our end goal is to map GF abstract fun names to lemmas in our target language. We do this by matching them by their WordNet synsets, so this method is only suitable for languages which have a (hopefully decently-sized) WordNet available in the format used by the OMW. The conversion is done by a script that uses standard terminal utilities (GNU versions). If you are on Mac OS X, you could install the gnu-coreutils library, or you could spin a Docker instance of Ubuntu.

obtain translation predictions

Not all words in a given synset in English are equally good translations for any word in your target language’s synset. Krasimir Angelov developed an heuristic algorithm to decide automatically which translation pairs seem to be good matches. You can find the details in this paper, and run the algorithm following the built-in instructions, which should yield you a predictions.tsv file, which you must place in this repository.

abstract syntax names and lemmas map

call

bash make-abstract-lemmas-map.bash

from this repository to create a file named fun-lemmas.tsv that maps abstract syntax names their corresponding lemmas in your target language.

building WordNet***

You now have several options to obtain your target language’s concrete WordNet module. If your target language already has some large dictionary module in GF, you can map its linearizations to the abstract syntax names names by matching lemmas. Another option is to apply smart paradigms to these lemmas in order to obtain tentative linearizations. Ideally you’ll have a large scale morphological resource which you can then check the GF linearizations against.

MakeDictFromLemmas.hs can build a rudimentary WordNet concrete by using the simplest smart paradigms. This will require extensive manual revision afterwards. It takes two filepaths as arguments, the first is a file in the same format as fun-lemmas.tsv, and the other is the name of the output file. You can customize the output language name and the functions to be applied (to some extent).

merge old checked WordNet module with new machine-generated one

let’s say your WordNet got updated, and you would like to carry over the updates without losing the linearizations you have already checked. you can do this by calling

bash merge-wordnet-modules.bash OLD_MODULE NEW-MODULE > WordNetXXX.gf

you’ll have to add the headers manually, though.