openvino.genai/site/docs/use-cases/image-processing/_sections/_usage_options/index.mdx at db5cc27dd39a821f5b7999447dd59837f37c93be · likholat/openvino.genai

import BasicGenerationConfiguration from '@site/docs/use-cases/_shared/_basic_generation_configuration.mdx'; import ChatScenario from '@site/docs/use-cases/_shared/_chat_scenario.mdx'; import GenerationConfigurationWorkflow from '@site/docs/use-cases/_shared/_generation_configuration_workflow.mdx'; import Streaming from '@site/docs/use-cases/_shared/_streaming.mdx';

Additional Usage Options

:::tip Check out Python and C++ visual language chat samples. :::

Use Image or Video Tags in Prompt

The prompt can contain <ov_genai_image_i> with i replaced with an actual zero based index to refer to an image. Reference to images used in previous prompts isn't implemented. A model's native image tag can be used instead of <ov_genai_image_i>. These tags are:

InternVL2: <image>\n
llava-1.5-7b-hf: <image>
LLaVA-NeXT: <image>
LLaVa-NeXT-Video: <image>
nanoLLaVA: <image>\n
nanoLLaVA-1.5: <image>\n
MiniCPM-o-2_6: <image>./</image>\n
MiniCPM-V-2_6: <image>./</image>\n
Phi-3-vision: <|image_i|>\n - the index starts with one
Phi-4-multimodal-instruct: <|image_i|>\n - the index starts with one
Qwen2-VL: <|vision_start|><|image_pad|><|vision_end|>
Qwen2.5-VL: <|vision_start|><|image_pad|><|vision_end|>
gemma-3-4b-it: <start_of_image>

Model's native video tag can be used to refer to a video. These tags are:

LLaVa-NeXT-Video: <video>
Qwen2-VL: <|vision_start|><|video_pad|><|vision_end|>
Qwen2.5-VL: <|vision_start|><|video_pad|><|vision_end|>

If the prompt doesn't contain image or video tags, but images or videos are provided, the tags are prepended to the prompt.

Use Different Generation Parameters

Similar to text generation, VLM pipelines support various generation parameters to control the text output.

```python import openvino_genai as ov_genai

      pipe = ov_genai.VLMPipeline(model_path, "CPU")

      # Get default configuration
      config = pipe.get_generation_config()

      # Modify parameters
      config.max_new_tokens = 100
      config.temperature = 0.7
      config.top_k = 50
      config.top_p = 0.9
      config.repetition_penalty = 1.2

      # Generate text with custom configuration
      output = pipe.generate(prompt, images, config)
      ```
  </TabItemPython>
  <TabItemCpp>
      ```cpp
      int main() {
          ov::genai::VLMPipeline pipe(model_path, "CPU");

          // Get default configuration
          auto config = pipe.get_generation_config();

          // Modify parameters
          config.max_new_tokens = 100;
          config.temperature = 0.7f;
          config.top_k = 50;
          config.top_p = 0.9f;
          config.repetition_penalty = 1.2f;

          // Generate text with custom configuration
          auto output = pipe.generate(prompt, images, config);
      }
      ```
  </TabItemCpp>
  <TabItemJS>
      ```javascript
      import { VLMPipeline }  from 'openvino-genai-node';

      const pipe = await VLMPipeline(modelPath, "CPU", {});

      // Create custom generation configuration
      const config = {
          max_new_tokens: 100,
          temperature: 0.7,
          top_k: 50,
          top_p: 0.9,
          repetition_penalty: 1.2
      };

      // Generate text with custom configuration
      const output = await pipe.generate(prompt, {
          images: images,
          generationConfig: config
      });
      ```
  </TabItemJS>

Working with LoRA Adapters

For Visual Language Models (VLMs), LoRA adapters can customize the generated text by applying adapters to the language-model (LLM) part. LoRA adapters that target the vision encoder or other multimodal components are not supported.

Refer to the LoRA Adapters guide for more details on working with LoRA adapters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional Usage Options

Use Image or Video Tags in Prompt

Use Different Generation Parameters

Working with LoRA Adapters

FilesExpand file tree

index.mdx

Latest commit

History

index.mdx

File metadata and controls

Additional Usage Options

Use Image or Video Tags in Prompt

Use Different Generation Parameters

Working with LoRA Adapters