If your target language has a Translate grammar but no WordNet, use
the migrate.hs
script to bootstrap it, and you are done. If your
language has both, it is better to bootstrap the new concrete using
WordNet. Else keep on reading.
Our end goal is to map GF abstract fun
names to lemmas in our target
language. We do this by matching them by their WordNet synsets, so
this method is only suitable for languages which have a (hopefully
decently-sized) WordNet available in the format used by the OMW. The
conversion is done by a script that uses standard terminal utilities
(GNU versions). If you are on Mac OS X, you could install the
gnu-coreutils library, or you could spin a Docker instance of Ubuntu.
Not all words in a given synset in English are equally good
translations for any word in your target language’s synset. Krasimir
Angelov developed an heuristic algorithm to decide automatically which
translation pairs seem to be good matches. You can find the details in
this paper, and run the algorithm following the built-in instructions,
which should yield you a predictions.tsv
file, which you must place
in this repository.
call
bash make-abstract-lemmas-map.bash
from this repository to create a file named fun-lemmas.tsv
that maps
abstract syntax names their corresponding lemmas in your target
language.
You now have several options to obtain your target language’s concrete
WordNet
module. If your target language already has some large
dictionary module in GF, you can map its linearizations to the
abstract syntax names names by matching lemmas. Another option is to
apply smart paradigms to these lemmas in order to obtain tentative
linearizations. Ideally you’ll have a large scale morphological
resource which you can then check the GF linearizations against.
MakeDictFromLemmas.hs
can build a rudimentary WordNet concrete by
using the simplest smart paradigms. This will require extensive manual
revision afterwards. It takes two filepaths as arguments, the first is
a file in the same format as fun-lemmas.tsv
, and the other is the
name of the output file. You can customize the output language name
and the functions to be applied (to some extent).
let’s say your WordNet got updated, and you would like to carry over the updates without losing the linearizations you have already checked. you can do this by calling
bash merge-wordnet-modules.bash OLD_MODULE NEW-MODULE > WordNetXXX.gf
you’ll have to add the headers manually, though.