| Architecture | Models | Example HuggingFace Models |
|---|---|---|
ChatGLMModel |
ChatGLM | |
GemmaForCausalLM |
Gemma | |
GPTNeoXForCausalLM |
Dolly | |
| RedPajama | ||
LlamaForCausalLM |
Llama 3 | |
| Llama 2 | ||
| OpenLLaMA | ||
| TinyLlama | ||
MistralForCausalLM |
Mistral | |
| Notus | ||
| Zephyr | ||
PhiForCausalLM |
Phi | |
QWenLMHeadModel |
Qwen |
Note
LoRA adapters are supported.
The pipeline can work with other similar topologies produced by optimum-intel with the same model signature. The model is required to have the following inputs after the conversion:
input_idscontains the tokens.attention_maskis filled with1.beam_idxselects beams.position_ids(optional) encodes a position of currently generating token in the sequence and a singlelogitsoutput.
Note
Models should belong to the same family and have the same tokenizers.
| Architecture | Models | LoRA support | Example HuggingFace Models | Notes |
|---|---|---|---|---|
InternVL2 |
InternVL2 | Not supported | ||
LLaVA |
LLaVA-v1.5 | Not supported | ||
LLaVA-NeXT |
LLaVa-v1.6 | Not supported | ||
MiniCPMV |
MiniCPM-V-2_6 | Not supported | ||
Phi3VForCausalLM |
phi3_v | Not supported |
eos_token_id with the one from a tokenizer: generation_config.set_eos_token_id(pipe.get_tokenizer().get_eos_token_id()). |
|
Qwen2-VL |
Qwen2-VL | Not supported |
| Architecture | Models | LoRA support | Example HuggingFace Models |
|---|---|---|---|
WhisperForConditionalGeneration |
Whisper | Not supported | |
| Distil-Whisper | Not supported |
| Architecture | LoRA support | Example HuggingFace Models |
|---|---|---|
BertModel |
Not supported | |
MPNetForMaskedLM |
Not supported | |
RobertaForMaskedLM |
Not supported | |
XLMRobertaModel |
Not supported |
| Architecture | Models | LoRA support | Example HuggingFace Models |
|---|---|---|---|
SpeechT5ForTextToSpeech |
SpeechT5 TTS | Not supported |
If https://huggingface.co/ is down, the conversion step won't be able to download the models.