Multilingual NMT Corpora Tools

Tools for preparing data for easy multilingual NMT training as described in Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Tools included

equalize-data.sh
- Upscales all files in an input directory to the amount of lines of the largest file from the directory
- Parameters
  - Input directory
  - Output directory
augment-data.sh
- Adds target language tags to all sentences of files
- Currently has hard-coded language tags and file names

Publications

If you use this tool, please cite the following paper:

Matīss Rikters, Mārcis Pinnis, Rihards Krišlauks (2018). "Training and Adapting Multilingual NMT for Less-resourced and Morphologically Rich Languages" In LREC 2018.

@InProceedings{RIKTERS18.75,
	author = {Matīss Rikters ,Mārcis Pinnis and Rihards Krišlauks},
	title = {Training and Adapting Multilingual NMT for Less-resourced and Morphologically Rich Languages},
	booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
	year = {2018},
	month = {may},
	date = {7-12},
	location = {Miyazaki, Japan},
	editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga},
	publisher = {European Language Resources Association (ELRA)},
	address = {Paris, France},
	isbn = {979-10-95546-00-9},
	language = {english}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
LREC-2018-Poster.pdf		LREC-2018-Poster.pdf
Readme.md		Readme.md
augment-data.sh		augment-data.sh
equalize-data.sh		equalize-data.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multilingual NMT Corpora Tools

Tools included

Publications

About

Uh oh!

Releases

Packages

Languages

License

dharun003/multilingual-nmt-data-prep

Folders and files

Latest commit

History

Repository files navigation

Multilingual NMT Corpora Tools

Tools included

Publications

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages