Skip to content

Latest commit

 

History

History
118 lines (93 loc) · 4.38 KB

File metadata and controls

118 lines (93 loc) · 4.38 KB

import BasicGenerationConfiguration from '@site/docs/use-cases/_shared/_basic_generation_configuration.mdx'; import ChatScenario from '@site/docs/use-cases/_shared/_chat_scenario.mdx'; import GenerationConfigurationWorkflow from '@site/docs/use-cases/_shared/_generation_configuration_workflow.mdx'; import Streaming from '@site/docs/use-cases/_shared/_streaming.mdx';

Additional Usage Options

:::tip Check out Python and C++ visual language chat samples. :::

Use Image or Video Tags in Prompt

The prompt can contain <ov_genai_image_i> with i replaced with an actual zero based index to refer to an image. Reference to images used in previous prompts isn't implemented. A model's native image tag can be used instead of <ov_genai_image_i>. These tags are:

  1. InternVL2: <image>\n
  2. llava-1.5-7b-hf: <image>
  3. LLaVA-NeXT: <image>
  4. LLaVa-NeXT-Video: <image>
  5. nanoLLaVA: <image>\n
  6. nanoLLaVA-1.5: <image>\n
  7. MiniCPM-o-2_6: <image>./</image>\n
  8. MiniCPM-V-2_6: <image>./</image>\n
  9. Phi-3-vision: <|image_i|>\n - the index starts with one
  10. Phi-4-multimodal-instruct: <|image_i|>\n - the index starts with one
  11. Qwen2-VL: <|vision_start|><|image_pad|><|vision_end|>
  12. Qwen2.5-VL: <|vision_start|><|image_pad|><|vision_end|>
  13. gemma-3-4b-it: <start_of_image>

Model's native video tag can be used to refer to a video. These tags are:

  1. LLaVa-NeXT-Video: <video>
  2. Qwen2-VL: <|vision_start|><|video_pad|><|vision_end|>
  3. Qwen2.5-VL: <|vision_start|><|video_pad|><|vision_end|>

If the prompt doesn't contain image or video tags, but images or videos are provided, the tags are prepended to the prompt.

Use Different Generation Parameters

Similar to text generation, VLM pipelines support various generation parameters to control the text output.

```python import openvino_genai as ov_genai
      pipe = ov_genai.VLMPipeline(model_path, "CPU")

      # Get default configuration
      config = pipe.get_generation_config()

      # Modify parameters
      config.max_new_tokens = 100
      config.temperature = 0.7
      config.top_k = 50
      config.top_p = 0.9
      config.repetition_penalty = 1.2

      # Generate text with custom configuration
      output = pipe.generate(prompt, images, config)
      ```
  </TabItemPython>
  <TabItemCpp>
      ```cpp
      int main() {
          ov::genai::VLMPipeline pipe(model_path, "CPU");

          // Get default configuration
          auto config = pipe.get_generation_config();

          // Modify parameters
          config.max_new_tokens = 100;
          config.temperature = 0.7f;
          config.top_k = 50;
          config.top_p = 0.9f;
          config.repetition_penalty = 1.2f;

          // Generate text with custom configuration
          auto output = pipe.generate(prompt, images, config);
      }
      ```
  </TabItemCpp>
  <TabItemJS>
      ```javascript
      import { VLMPipeline }  from 'openvino-genai-node';

      const pipe = await VLMPipeline(modelPath, "CPU", {});

      // Create custom generation configuration
      const config = {
          max_new_tokens: 100,
          temperature: 0.7,
          top_k: 50,
          top_p: 0.9,
          repetition_penalty: 1.2
      };

      // Generate text with custom configuration
      const output = await pipe.generate(prompt, {
          images: images,
          generationConfig: config
      });
      ```
  </TabItemJS>

Working with LoRA Adapters

For Visual Language Models (VLMs), LoRA adapters can customize the generated text by applying adapters to the language-model (LLM) part. LoRA adapters that target the vision encoder or other multimodal components are not supported.

Refer to the LoRA Adapters guide for more details on working with LoRA adapters.