Skip to content

[Feature Request] Add C++ sample for Live VLM Chat using Webcam (OpenCV) #3307

@Ashitpatel001

Description

@Ashitpatel001

Problem

Currently, the visual_language_chat sample only supports static images loaded from a file path.

While this demonstrates the API, it does not showcase the real-time performance capabilities of OpenVINO GenAI on edge devices. Developers building visual agents or robots need a reference implementation for handling continuous video streams without blocking the inference loop.

Proposed Solution

I propose adding a new C++ sample: live_vlm_chat.
This sample integrates OpenCV to capture a live webcam feed and interacts with the VLM (e.g., LLaVA/Mistral) in real-time.

Key Features of the proposed sample:

  • Multi-threaded Architecture: Decouples the UI/Camera loop (Main Thread) from the Inference loop (Worker Thread) to ensure the video feed never freezes while the LLM is "thinking."
  • Thread Safety: Implements std::mutex and std::condition_variable to safely pass frames between threads.
  • MSVC Compatibility: Solves the C3889 build error on Windows by strictly defining tensor types (std::vector<ov::Tensor>) for the ov::genai::images constructor.
  • Interactive UI: Allows users to "snap" a frame and chat with it while the camera continues running.

Implementation Details

I have already implemented and tested this locally on Windows 11 (Intel CPU & iGPU).

  • Dependencies: Adds OpenCV (core, highgui, videoio) as an optional dependency in CMakeLists.txt.
  • File: samples/cpp/visual_language_chat/live_vlm_chat.cpp

I have a working implementation ready and tested. I can submit a Pull Request immediately if this contribution aligns with the project's roadmap.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions