This app automatically splits longer texts into paragraphs and sentences before translating, thus overcoming the short maximum text length of the MADLAD-400 models.
- Install https://github.com/mamei16/llama-cpp-binaries (This will install a fork of llama.cpp, which adds support for madlad models to the llama-server.)
- Install the other requirements
- Download a GGUF version of a madlad400 model. Note that GGUF models from the official repo do not work. Models that are confirmed to work:
- Launch the app with:
python app.py <path_to_your_gguf>
If you run an 8-bit quantized version of the 10B model, it can be run with 11GB of VRAM. The Q8_0 version of the 3B model can be run with 6GB of VRAM.