Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
4d1e8b0
samples: add Python VLM alerts sample using HF Optimum + gvagenai
oonyshch Feb 17, 2026
032c733
fix for the pipeline and additional packages resolution
oonyshch Feb 17, 2026
361fe2f
Merge branch 'main' into oonyshch/vlm_alerts
oonyshch Feb 17, 2026
085783d
modify requirements.txt
oonyshch Feb 17, 2026
a6c5825
fix in requirements.txt
oonyshch Feb 19, 2026
026849d
Merge branch 'main' into oonyshch/vlm_alerts
oonyshch Feb 23, 2026
edd52a2
vlm_alerts: fix pipeline string
oonyshch Feb 23, 2026
cd3601f
vlm_alerts: add README.md
oonyshch Feb 23, 2026
173d146
vlm_alerts: refactoring script after being pylint-shamed
oonyshch Feb 23, 2026
2bad652
vlm_alerts: trying to avoid the gst pylint error and restoring the li…
oonyshch Feb 23, 2026
cbcf7e6
vlm_alerts: make pylint ignore the gst import
oonyshch Feb 23, 2026
4e33bf2
vlm_alerts: disable pylint on both gst and glib
oonyshch Feb 23, 2026
d2ccfed
Windows - install VS Build Tools in setup script (#630)
dmichalo Feb 23, 2026
94ee21f
Fixed inconsistencies between code and comments. (#632)
jmotow Feb 23, 2026
0d0f348
Enable custom code to add GstAnalytics data outside of DLS components…
tjanczak Feb 24, 2026
2cfcd49
Extend Optimizer about input device selection and improved results re…
tbujewsk Feb 24, 2026
a7d9843
Disable gstreamer gpl plugins (#636)
mholowni Feb 24, 2026
ac6c189
[POST-PROC][YOLOv26 OBB] add blob parsing function to handle obb dime…
walidbarakat Feb 25, 2026
b3ee1db
Install Visual C++ runtime in setup (#635)
yunowo Feb 25, 2026
d9a159d
[GST gvawatermark] fix watermark default text backgroung behaviour (#…
walidbarakat Feb 25, 2026
2eecaf7
[DOCS] fix formatting (#641)
kblaszczak-intel Feb 25, 2026
c18fb0f
Fix yolo_v10.cpp compile error on windows (#645)
yunowo Feb 26, 2026
3fdc177
[DOCS] Add a warning about improper proxy handling by PAHO library (#…
msmiatac Feb 26, 2026
0941c5c
Update to OpenVino 2026.0.0 (#640)
tbujewsk Feb 26, 2026
b13008e
[vlm_alerts.py]: refine alert logic and improve processing flow
oonyshch Feb 26, 2026
fa1e62d
Merge branch 'main' into oonyshch/vlm_alerts
oonyshch Feb 26, 2026
c1b9594
Merge branch 'main' into oonyshch/vlm_alerts
oonyshch Feb 26, 2026
223951c
cancel changes in cmake and dockerfiles
oonyshch Feb 26, 2026
ced3814
refactoring README.md and requirements.txt
oonyshch Feb 26, 2026
8684109
Merge branch 'main' into oonyshch/vlm_alerts
oonyshch Feb 26, 2026
59a80be
vlm_alerts: add CLI help section to README and fix gi import order
oonyshch Feb 26, 2026
70caa31
vlm_alerts: improve graph in README and change venv name
oonyshch Feb 27, 2026
50d78eb
Merge branch 'main' into oonyshch/vlm_alerts
oonyshch Feb 27, 2026
60c1dc3
vlm_alerts: forgot parenthesis in graph
oonyshch Feb 27, 2026
86f4c1e
vlm_alerts: refactoring of requirements to match new sample conventio…
oonyshch Feb 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions samples/gstreamer/python/vlm_alerts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# VLM Alerts

This sample demonstrates how to download a Vision-Language Model (VLM) from Hugging Face, export it to OpenVINO IR using `optimum-cli`, and run inference in a DL Streamer pipeline.
Comment thread
oonyshch marked this conversation as resolved.
Outdated

The pipeline saves both JSON metadata and an encoded MP4 output.

## How It Works

The script performs three main steps:

STEP 1 — Prepare input video
If a local file is provided, it is used directly.
If a URL is provided, the video is downloaded automatically into the `videos/` directory.

STEP 2 — Prepare VLM model
Comment thread
oonyshch marked this conversation as resolved.
Outdated

Exported artifacts are stored under:

models/<ModelName>

STEP 3 — Build and run the pipeline

The GStreamer pipeline includes:

- gvagenai for VLM inference
Comment thread
oonyshch marked this conversation as resolved.
Outdated
- gvametapublish for JSON output
- gvafpscounter for performance display
- gvawatermark for overlay
- vah264enc for hardware encoding

The output video and metadata are written to the `results/` directory.

## Setup

From the sample directory:

cd samples/gstreamer/python/vlm_alerts

Create and activate a virtual environment:

python3 -m venv .venv --system-site-packages
source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

## Running

python3 ./vlm_alerts.py <input_video_or_url> <hf_model_id> "<question>"

Example:

python3 ./vlm_alerts.py \
https://videos.pexels.com/video-files/2103099/2103099-hd_1280_720_60fps.mp4 \
OpenGVLab/InternVL3_5-2B \
"Is there a police car? Answer yes or no."

## Output

After execution:

JSON metadata:

results/<model>-<video>.jsonl

Annotated video:

results/<model>-<video>.mp4

## Notes

- Each video and model are downloaded and exported once.
- Different VLMs can be downloaded. Suggested: OpenGVLab/InternVL3_5-2B, openbmb/MiniCPM-V-4_5, Qwen/Qwen2.5-VL-3B-Instruct.
Comment thread
oonyshch marked this conversation as resolved.
Outdated
- Subsequent runs reuse cached assets.
- GPU is used by default.
10 changes: 10 additions & 0 deletions samples/gstreamer/python/vlm_alerts/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
--extra-index-url https://download.pytorch.org/whl/cpu
PyGObject==3.50.0
torch==2.9.0+cpu
transformers==4.57.6
optimum-intel==1.27.0
huggingface_hub==0.36.1
einops
Comment thread
oonyshch marked this conversation as resolved.
Outdated
timm
openvino==2025.4.0
openvino_tokenizers==2025.4.0.0
235 changes: 235 additions & 0 deletions samples/gstreamer/python/vlm_alerts/vlm_alerts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
#!/usr/bin/env python3
"""
Run a DLStreamer VLM pipeline on a video and export JSON and MP4 results.

The script can:
1. Download or reuse a local video.
2. Export or reuse an OpenVINO model.
3. Build a GStreamer pipeline string.
4. Execute the pipeline and store results.
"""

import argparse
import os
import subprocess
import sys
import tempfile
import urllib.request
from dataclasses import dataclass
from pathlib import Path
from typing import Tuple

import gi
from gi.repository import Gst, GLib
Comment thread
oonyshch marked this conversation as resolved.
Outdated
gi.require_version("Gst", "1.0")


BASE_DIR = Path(__file__).resolve().parent
VIDEOS_DIR = BASE_DIR / "videos"
MODELS_DIR = BASE_DIR / "models"
RESULTS_DIR = BASE_DIR / "results"
Comment thread
oonyshch marked this conversation as resolved.
Outdated


@dataclass
class PipelineConfig:
"""Configuration required to build and run the pipeline."""

video: Path
model: Path
question: str
device: str
max_tokens: int
frame_rate: float


def ensure_video(path_or_url: str) -> Path:
Comment thread
oonyshch marked this conversation as resolved.
Outdated
"""Return a local video path, downloading it if needed."""
candidate = Path(path_or_url)
if candidate.is_file():
return candidate.resolve()

VIDEOS_DIR.mkdir(exist_ok=True)
filename = path_or_url.rstrip("/").split("/")[-1]
local_path = VIDEOS_DIR / filename

if local_path.exists():
print(f"[video] using cached {local_path}")
return local_path.resolve()

print(f"[video] downloading {path_or_url}")
request = urllib.request.Request(
path_or_url,
headers={"User-Agent": "Mozilla/5.0"},
)

with urllib.request.urlopen(request) as response, open(local_path, "wb") as file:
file.write(response.read())
Comment thread
oonyshch marked this conversation as resolved.
Outdated

return local_path.resolve()


def ensure_model(model_id: str) -> Path:
Comment thread
oonyshch marked this conversation as resolved.
Outdated
"""Return a local OpenVINO model directory, exporting it if needed."""
model_name = model_id.split("/")[-1]
output_dir = MODELS_DIR / model_name

if output_dir.exists() and any(output_dir.glob("*.xml")):
print(f"[model] using cached {output_dir}")
return output_dir.resolve()

MODELS_DIR.mkdir(exist_ok=True)

command = [
"optimum-cli",
"export",
"openvino",
"--model",
model_id,
"--task",
"image-text-to-text",
"--trust-remote-code",
str(output_dir),
]

print("[model] exporting:", " ".join(command))
subprocess.run(command, check=True)

if not any(output_dir.glob("*.xml")):
raise RuntimeError("OpenVINO export failed, no XML files found")
Comment thread
oonyshch marked this conversation as resolved.
Outdated

return output_dir.resolve()


def build_pipeline_string(cfg: PipelineConfig) -> Tuple[str, Path, Path, Path]:
"""Construct the GStreamer pipeline string and related output paths."""
RESULTS_DIR.mkdir(exist_ok=True)

output_json = RESULTS_DIR / f"{cfg.model.name}-{cfg.video.stem}.jsonl"
output_video = RESULTS_DIR / f"{cfg.model.name}-{cfg.video.stem}.mp4"

fd, prompt_path_str = tempfile.mkstemp(suffix=".txt")
prompt_path = Path(prompt_path_str)
with os.fdopen(fd, "w") as file:
file.write(cfg.question)

generation_cfg = f"max_new_tokens={cfg.max_tokens}"

pipeline_str = (
f'filesrc location="{cfg.video}" ! '
f'decodebin3 ! '
f'videoconvertscale ! '
f'video/x-raw,format=BGRx,width=1280,height=720 ! '
Comment thread
oonyshch marked this conversation as resolved.
f'queue ! '
f'gvagenai '
f'model-path="{cfg.model}" '
f'device={cfg.device} '
f'prompt-path="{prompt_path}" '
f'generation-config="{generation_cfg}" '
f'chunk-size=1 '
f'frame-rate={cfg.frame_rate} '
f'metrics=true ! '
f'queue ! '
f'gvametapublish file-format=json-lines '
f'file-path="{output_json}" ! '
f'queue ! '
f'gvafpscounter ! '
f'gvawatermark displ-cfg=text-scale=0.5 ! '
f'videoconvert ! '
f'vah264enc ! '
f'h264parse ! '
f'mp4mux ! '
f'filesink location="{output_video}"'
)

return pipeline_str, output_json, output_video, prompt_path


def run_pipeline_string(pipeline_str: str) -> int:
"""Execute a GStreamer pipeline string and block until completion."""
Gst.init(None)

try:
pipeline = Gst.parse_launch(pipeline_str)
except GLib.Error as error:
Comment thread
oonyshch marked this conversation as resolved.
print("Pipeline parse error:", str(error))
return 1

bus = pipeline.get_bus()
pipeline.set_state(Gst.State.PLAYING)

while True:
message = bus.timed_pop_filtered(
Gst.CLOCK_TIME_NONE,
Gst.MessageType.ERROR | Gst.MessageType.EOS,
)

if message.type == Gst.MessageType.ERROR:
err, debug = message.parse_error()
print("ERROR:", err.message)
if debug:
print("DEBUG:", debug)
pipeline.set_state(Gst.State.NULL)
return 1

if message.type == Gst.MessageType.EOS:
pipeline.set_state(Gst.State.NULL)
return 0


def run_pipeline(cfg: PipelineConfig) -> int:
"""Build and execute the pipeline from configuration."""
pipeline_str, output_json, output_video, prompt_path = build_pipeline_string(cfg)

print("\nPipeline:\n")
print(pipeline_str)
print()

try:
result = run_pipeline_string(pipeline_str)
finally:
if prompt_path.exists():
prompt_path.unlink()

if result == 0:
print(f"\nJSON output: {output_json}")
print(f"Video output: {output_video}")

return result


def parse_args() -> argparse.Namespace:
"""Parse command line arguments."""
parser = argparse.ArgumentParser(
description="DLStreamer VLM Alerts sample"
)
parser.add_argument("video")
parser.add_argument("model")
parser.add_argument("question")
parser.add_argument("--device", default="GPU")
parser.add_argument("--max-tokens", type=int, default=20)
parser.add_argument("--frame-rate", type=float, default=1.0)
Comment thread
oonyshch marked this conversation as resolved.
Outdated

return parser.parse_args()


def main() -> int:
"""Entry point."""
args = parse_args()

video_path = ensure_video(args.video)
model_path = ensure_model(args.model)
Comment thread
oonyshch marked this conversation as resolved.
Outdated

config = PipelineConfig(
video=video_path,
model=model_path,
question=args.question,
device=args.device,
max_tokens=args.max_tokens,
frame_rate=args.frame_rate,
)

return run_pipeline(config)


if __name__ == "__main__":
sys.exit(main())
Loading