This the end goal for the current project scope.
Here the goal is to release a model with following properties:
- Truly open-source
- 3.3B dense
- Supports all 202 NLLB languages in both direction
Note: it will be very hard to get a satisfactory level of quality for 202 languages with a dense checkpoint. The original work from Meta used a ~54B parameter MoE (mixture of experts) model to get decent results + a ton of compute (~52k hours on A100-SXM-80GB).
We do have plans to scale beyond 3.3B parameters scale.