TranslateGemma is a family of lightweight, state-of-the-art open translation models from Google, based on the Gemma 3 family of models. TranslateGemma models are designed to handle translation tasks across 55 languages. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art translation models and helping foster innovation for everyone.
Original Models:
Optimized vLLM Models:
The original Google models have compatibility issues with standard inference engines like vLLM. The optimized versions from Infomaniak-AI (see detailed changes) resolve these issues:
- vLLM Compatibility: The original models require custom JSON parameters (
source_lang_codeandtarget_lang_code) that are not supported by the standard vLLM/OpenAI chat interface. The optimized version uses string delimiters instead. - RoPE Simplification: The original models use a complex RoPE configuration for sliding attention. The optimized versions use a standard linear RoPE format (
factor: 8.0) that vLLM can correctly parse. - EOS Token Fix: Corrects the EOS token from
<end_of_turn>to<eos>to ensure proper sequence termination.
docker pull vllm/vllm-openai:v0.14.1-cu130The following configuration has been verified for the 4B/27B model
docker run -itd --name google-translategemma-27b-it \
--ipc=host \
--network host \
--shm-size 16G \
--gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
vllm/vllm-openai:v0.14.1-cu130 \
Infomaniak-AI/vllm-translategemma-27b-it \
--served-model-name translategemma-27b-it \
--gpu-memory-utilization 0.8 \
--optimization-level 0 \
--host 0.0.0.0 \
--port 8000Tips:
- Prompt Delimiters: Language metadata is encoded directly into the content string using specific delimiters:
<<<source>>>{src_lang}<<<target>>>{tgt_lang}<<<text>>>{text} - Language Codes: Supports ISO 639-1 Alpha-2 codes (e.g.,
en,zh) and regional variants (e.g.,en_US,zh_CN). - Context Limit: The model is optimized for a context window of approximately 2K tokens.
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "translategemma-27b-it",
"messages": [
{
"role": "user",
"content": "<<<source>>>en<<<target>>>zh<<<text>>>We distribute two models for language identification, which can recognize 176 languages."
}
]
}'