Open
Description
Motivation
- Decoupling dialogue templates from the inference engine.
- Reduce the barrier to adding new dialogue templates.
- Remove
model_name
from EngineConfig to avoid redundant specification. - Support external dialogue templates compatible with Transformers.
Major features
- The
Tokenizer
class supports Transformers’ Jinja dialogue templates. - Original
model.get_prompt
will be moved toTokenizer.apply_chat_template
model_name
is removed fromTurbomindEngineConfig
andPytorchEngineConfig
.
How to use
For api_server, to use an extra template, commands could be:
lmdeploy serve api_server $MODEL_PATH --chat-template $JINJIA
For APIs like pipeline
, we are going to provide documents to show how to add a chat template in python or Jinjia.
The codes will be:
chat_template = PythonTemplate() # or a function or a Jinjia str or file path
input_inds = tokenizer.apply_chat_template(messages, chat_template=chat_template)
pipeline(input_ids)