Skip to content

Latest commit

 

History

History
131 lines (87 loc) · 3.79 KB

File metadata and controls

131 lines (87 loc) · 3.79 KB

import ImageGenerationModelsTable from './_components/image-generation-models-table'; import VideoGenerationModelsTable from './_components/video-generation-models-table'; import LLMModelsTable from './_components/llm-models-table'; import VLMModelsTable from './_components/vlm-models-table'; import WhisperModelsTable from './_components/whisper-models-table'; import TextEmbeddingsModelsTable from './_components/text-embeddings-models-table'; import SpeechGenerationModelsTable from './_components/speech-generation-models-table'; import TextRerankModelsTable from './_components/text-rerank-models-table';

Supported Models

:::info Models Compatibility Other models with similar architectures may also work successfully even if not explicitly validated. Consider testing any unlisted models to verify compatibility with your specific use case. :::

Large Language Models (LLMs)

:::tip LoRA Support LLM pipeline supports LoRA adapters. :::

::::info

The LLM pipeline can work with other similar topologies produced by optimum-intel with the same model signature. The model is required to have the following inputs after the conversion:

  1. input_ids contains the tokens.
  2. attention_mask is filled with 1.
  3. beam_idx selects beams.
  4. position_ids (optional) encodes a position of currently generating token in the sequence and a single logits output.

:::note

Models should belong to the same family and have the same tokenizers.

:::

::::

Image Generation Models

Video Generation Models

Visual Language Models (VLMs)

:::tip LoRA Support VLM pipeline supports LoRA adapters applied to the language-model (LLM) part. LoRA adapters targeting the vision encoder or other multimodal components are not supported. :::

:::warning VLM Models Notes

InternVL2 {#internvl2-notes}

To convert InternVL2 models, timm and einops are required:

pip install timm einops

MiniCPMO {#minicpm-o-notes}

  1. openbmb/MiniCPM-o-2_6 doesn't support transformers>=4.52 which is required for optimum-cli export.
  2. --task image-text-to-text is required for optimum-cli export openvino --trust-remote-code because image-text-to-text isn't MiniCPM-o-2_6's native task.

phi3_v {#phi3_v-notes}

Models' configs aren't consistent. It's required to override the default eos_token_id with the one from a tokenizer:

generation_config.set_eos_token_id(pipe.get_tokenizer().get_eos_token_id())

phi4mm {#phi4mm-notes}

Apply https://huggingface.co/microsoft/Phi-4-multimodal-instruct/discussions/78/files to fix the model export for transformers>=4.50 :::

Speech Recognition Models (Whisper-based)

:::info LoRA Support Speech recognition pipeline does not support LoRA adapters. :::

Speech Generation Models

:::info LoRA Support Speech generation pipeline does not support LoRA adapters. :::

Text Embeddings Models

:::info LoRA Support Text embeddings pipeline does not support LoRA adapters. :::

:::warning Text Embeddings Models Notes Qwen3 Embedding models require --task feature-extraction during the conversion with optimum-cli. :::

Text Rerank Models

:::info LoRA Support Text rerank pipeline does not support LoRA adapters. :::

:::warning Text Rerank Models Notes Text Rerank models require appropriate --task provided during the conversion with optimum-cli. Task can be found in the table above. :::


:::info Hugging Face Notes Some models may require access request submission on the Hugging Face page to be downloaded.

If https://huggingface.co/ is down, the conversion step won't be able to download the models. :::