Mistral.rs attempts to automatically load a chat template from the tokenizer_config.json file. This enables high flexibility across instruction-tuned models and ensures accurate chat templating. However, if the chat_template field is missing, then a JINJA chat template should be provided. The JINJA chat template may use messages, add_generation_prompt, bos_token, eos_token, and unk_token as inputs.
We provide some chat templates here, and it is easy to modify or create others to customize chat template behavior.
For example, to use the chatml template, --chat-template is specified before the model architecture. For example:
./mitralrs-server --port 1234 --log output.log --chat-template ./chat_templates/chatml.json llamaNote: For GGUF models, the chat template may be loaded directly from the GGUF file by omitting any other chat template sources.
Some models do not provide a tokenizer.json file although mistral.rs expects one. To solve this, please run this script. It will output the tokenizer.json file for your specific model. This may be used by passing the --tokenizer-json flag after the model architecture. For example:
$ python3 scripts/get_tokenizers_json.py
Enter model ID: microsoft/Orca-2-13b
$ ./mistralrs-server --port 1234 --log output.log plain -m microsoft/Orca-2-13b --tokenizer-json tokenizer.jsonPutting it all together, to run, for example, an Orca model (which does not come with a tokenizer.json or chat template):
- Generate the
tokenizer.jsonby running the script atscripts/get_tokenizers_json.py. This will output some files includingtokenizer.jsonin the working directory. - Find and copy the correct chat template from
chat-templatesto the working directory (eg.,cp chat_templates/chatml.json .) - Run
mistralrs-server, specifying the tokenizer and chat template:cargo run --release --features cuda -- --port 1234 --log output.txt --chat-template chatml.json plain -m microsoft/Orca-2-13b -t tokenizer.json -a llama
Note: For GGUF models, the tokenizer may be loaded directly from the GGUF file by omitting the tokenizer model ID.