Our current project scope's goal is to release an open-source, 3.3B, dense checkpoint that does machine translation for 202 languages from the NLLB project.
Going past that I would love to scale up to 7B parameters dense transformers and train a set of such models for different language families:
- 7B Open-NLLB model for Slavic languages
- 7B Open-NLLB model for African languages
- 7B Open-NLLB model for Germanic languages
- ...
(I'm not a linguist so excuse me if any mistakes in the preliminary list above :) ).