-
Notifications
You must be signed in to change notification settings - Fork 192
[Samples]: Add VLM alerts sample in python 1/2 #620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 9 commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
4d1e8b0
samples: add Python VLM alerts sample using HF Optimum + gvagenai
oonyshch 032c733
fix for the pipeline and additional packages resolution
oonyshch 361fe2f
Merge branch 'main' into oonyshch/vlm_alerts
oonyshch 085783d
modify requirements.txt
oonyshch a6c5825
fix in requirements.txt
oonyshch 026849d
Merge branch 'main' into oonyshch/vlm_alerts
oonyshch edd52a2
vlm_alerts: fix pipeline string
oonyshch cd3601f
vlm_alerts: add README.md
oonyshch 173d146
vlm_alerts: refactoring script after being pylint-shamed
oonyshch 2bad652
vlm_alerts: trying to avoid the gst pylint error and restoring the li…
oonyshch cbcf7e6
vlm_alerts: make pylint ignore the gst import
oonyshch 4e33bf2
vlm_alerts: disable pylint on both gst and glib
oonyshch d2ccfed
Windows - install VS Build Tools in setup script (#630)
dmichalo 94ee21f
Fixed inconsistencies between code and comments. (#632)
jmotow 0d0f348
Enable custom code to add GstAnalytics data outside of DLS components…
tjanczak 2cfcd49
Extend Optimizer about input device selection and improved results re…
tbujewsk a7d9843
Disable gstreamer gpl plugins (#636)
mholowni ac6c189
[POST-PROC][YOLOv26 OBB] add blob parsing function to handle obb dime…
walidbarakat b3ee1db
Install Visual C++ runtime in setup (#635)
yunowo d9a159d
[GST gvawatermark] fix watermark default text backgroung behaviour (#…
walidbarakat 2eecaf7
[DOCS] fix formatting (#641)
kblaszczak-intel c18fb0f
Fix yolo_v10.cpp compile error on windows (#645)
yunowo 3fdc177
[DOCS] Add a warning about improper proxy handling by PAHO library (#…
msmiatac 0941c5c
Update to OpenVino 2026.0.0 (#640)
tbujewsk b13008e
[vlm_alerts.py]: refine alert logic and improve processing flow
oonyshch fa1e62d
Merge branch 'main' into oonyshch/vlm_alerts
oonyshch c1b9594
Merge branch 'main' into oonyshch/vlm_alerts
oonyshch 223951c
cancel changes in cmake and dockerfiles
oonyshch ced3814
refactoring README.md and requirements.txt
oonyshch 8684109
Merge branch 'main' into oonyshch/vlm_alerts
oonyshch 59a80be
vlm_alerts: add CLI help section to README and fix gi import order
oonyshch 70caa31
vlm_alerts: improve graph in README and change venv name
oonyshch 50d78eb
Merge branch 'main' into oonyshch/vlm_alerts
oonyshch 60c1dc3
vlm_alerts: forgot parenthesis in graph
oonyshch 86f4c1e
vlm_alerts: refactoring of requirements to match new sample conventio…
oonyshch File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,76 @@ | ||
| # VLM Alerts | ||
|
|
||
| This sample demonstrates how to download a Vision-Language Model (VLM) from Hugging Face, export it to OpenVINO IR using `optimum-cli`, and run inference in a DL Streamer pipeline. | ||
|
|
||
| The pipeline saves both JSON metadata and an encoded MP4 output. | ||
|
|
||
| ## How It Works | ||
|
|
||
| The script performs three main steps: | ||
|
|
||
| STEP 1 — Prepare input video | ||
| If a local file is provided, it is used directly. | ||
| If a URL is provided, the video is downloaded automatically into the `videos/` directory. | ||
|
|
||
| STEP 2 — Prepare VLM model | ||
|
oonyshch marked this conversation as resolved.
Outdated
|
||
|
|
||
| Exported artifacts are stored under: | ||
|
|
||
| models/<ModelName> | ||
|
|
||
| STEP 3 — Build and run the pipeline | ||
|
|
||
| The GStreamer pipeline includes: | ||
|
|
||
| - gvagenai for VLM inference | ||
|
oonyshch marked this conversation as resolved.
Outdated
|
||
| - gvametapublish for JSON output | ||
| - gvafpscounter for performance display | ||
| - gvawatermark for overlay | ||
| - vah264enc for hardware encoding | ||
|
|
||
| The output video and metadata are written to the `results/` directory. | ||
|
|
||
| ## Setup | ||
|
|
||
| From the sample directory: | ||
|
|
||
| cd samples/gstreamer/python/vlm_alerts | ||
|
|
||
| Create and activate a virtual environment: | ||
|
|
||
| python3 -m venv .venv --system-site-packages | ||
| source .venv/bin/activate | ||
|
|
||
| Install dependencies: | ||
|
|
||
| pip install -r requirements.txt | ||
|
|
||
| ## Running | ||
|
|
||
| python3 ./vlm_alerts.py <input_video_or_url> <hf_model_id> "<question>" | ||
|
|
||
| Example: | ||
|
|
||
| python3 ./vlm_alerts.py \ | ||
| https://videos.pexels.com/video-files/2103099/2103099-hd_1280_720_60fps.mp4 \ | ||
| OpenGVLab/InternVL3_5-2B \ | ||
| "Is there a police car? Answer yes or no." | ||
|
|
||
| ## Output | ||
|
|
||
| After execution: | ||
|
|
||
| JSON metadata: | ||
|
|
||
| results/<model>-<video>.jsonl | ||
|
|
||
| Annotated video: | ||
|
|
||
| results/<model>-<video>.mp4 | ||
|
|
||
| ## Notes | ||
|
|
||
| - Each video and model are downloaded and exported once. | ||
| - Different VLMs can be downloaded. Suggested: OpenGVLab/InternVL3_5-2B, openbmb/MiniCPM-V-4_5, Qwen/Qwen2.5-VL-3B-Instruct. | ||
|
oonyshch marked this conversation as resolved.
Outdated
|
||
| - Subsequent runs reuse cached assets. | ||
| - GPU is used by default. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| --extra-index-url https://download.pytorch.org/whl/cpu | ||
| PyGObject==3.50.0 | ||
| torch==2.9.0+cpu | ||
| transformers==4.57.6 | ||
| optimum-intel==1.27.0 | ||
| huggingface_hub==0.36.1 | ||
| einops | ||
|
oonyshch marked this conversation as resolved.
Outdated
|
||
| timm | ||
| openvino==2025.4.0 | ||
| openvino_tokenizers==2025.4.0.0 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,235 @@ | ||
| #!/usr/bin/env python3 | ||
| """ | ||
| Run a DLStreamer VLM pipeline on a video and export JSON and MP4 results. | ||
|
|
||
| The script can: | ||
| 1. Download or reuse a local video. | ||
| 2. Export or reuse an OpenVINO model. | ||
| 3. Build a GStreamer pipeline string. | ||
| 4. Execute the pipeline and store results. | ||
| """ | ||
|
|
||
| import argparse | ||
| import os | ||
| import subprocess | ||
| import sys | ||
| import tempfile | ||
| import urllib.request | ||
| from dataclasses import dataclass | ||
| from pathlib import Path | ||
| from typing import Tuple | ||
|
|
||
| import gi | ||
| from gi.repository import Gst, GLib | ||
|
oonyshch marked this conversation as resolved.
Outdated
|
||
| gi.require_version("Gst", "1.0") | ||
|
|
||
|
|
||
| BASE_DIR = Path(__file__).resolve().parent | ||
| VIDEOS_DIR = BASE_DIR / "videos" | ||
| MODELS_DIR = BASE_DIR / "models" | ||
| RESULTS_DIR = BASE_DIR / "results" | ||
|
oonyshch marked this conversation as resolved.
Outdated
|
||
|
|
||
|
|
||
| @dataclass | ||
| class PipelineConfig: | ||
| """Configuration required to build and run the pipeline.""" | ||
|
|
||
| video: Path | ||
| model: Path | ||
| question: str | ||
| device: str | ||
| max_tokens: int | ||
| frame_rate: float | ||
|
|
||
|
|
||
| def ensure_video(path_or_url: str) -> Path: | ||
|
oonyshch marked this conversation as resolved.
Outdated
|
||
| """Return a local video path, downloading it if needed.""" | ||
| candidate = Path(path_or_url) | ||
| if candidate.is_file(): | ||
| return candidate.resolve() | ||
|
|
||
| VIDEOS_DIR.mkdir(exist_ok=True) | ||
| filename = path_or_url.rstrip("/").split("/")[-1] | ||
| local_path = VIDEOS_DIR / filename | ||
|
|
||
| if local_path.exists(): | ||
| print(f"[video] using cached {local_path}") | ||
| return local_path.resolve() | ||
|
|
||
| print(f"[video] downloading {path_or_url}") | ||
| request = urllib.request.Request( | ||
| path_or_url, | ||
| headers={"User-Agent": "Mozilla/5.0"}, | ||
| ) | ||
|
|
||
| with urllib.request.urlopen(request) as response, open(local_path, "wb") as file: | ||
| file.write(response.read()) | ||
|
oonyshch marked this conversation as resolved.
Outdated
|
||
|
|
||
| return local_path.resolve() | ||
|
|
||
|
|
||
| def ensure_model(model_id: str) -> Path: | ||
|
oonyshch marked this conversation as resolved.
Outdated
|
||
| """Return a local OpenVINO model directory, exporting it if needed.""" | ||
| model_name = model_id.split("/")[-1] | ||
| output_dir = MODELS_DIR / model_name | ||
|
|
||
| if output_dir.exists() and any(output_dir.glob("*.xml")): | ||
| print(f"[model] using cached {output_dir}") | ||
| return output_dir.resolve() | ||
|
|
||
| MODELS_DIR.mkdir(exist_ok=True) | ||
|
|
||
| command = [ | ||
| "optimum-cli", | ||
| "export", | ||
| "openvino", | ||
| "--model", | ||
| model_id, | ||
| "--task", | ||
| "image-text-to-text", | ||
| "--trust-remote-code", | ||
| str(output_dir), | ||
| ] | ||
|
|
||
| print("[model] exporting:", " ".join(command)) | ||
| subprocess.run(command, check=True) | ||
|
|
||
| if not any(output_dir.glob("*.xml")): | ||
| raise RuntimeError("OpenVINO export failed, no XML files found") | ||
|
oonyshch marked this conversation as resolved.
Outdated
|
||
|
|
||
| return output_dir.resolve() | ||
|
|
||
|
|
||
| def build_pipeline_string(cfg: PipelineConfig) -> Tuple[str, Path, Path, Path]: | ||
| """Construct the GStreamer pipeline string and related output paths.""" | ||
| RESULTS_DIR.mkdir(exist_ok=True) | ||
|
|
||
| output_json = RESULTS_DIR / f"{cfg.model.name}-{cfg.video.stem}.jsonl" | ||
| output_video = RESULTS_DIR / f"{cfg.model.name}-{cfg.video.stem}.mp4" | ||
|
|
||
| fd, prompt_path_str = tempfile.mkstemp(suffix=".txt") | ||
| prompt_path = Path(prompt_path_str) | ||
| with os.fdopen(fd, "w") as file: | ||
| file.write(cfg.question) | ||
|
|
||
| generation_cfg = f"max_new_tokens={cfg.max_tokens}" | ||
|
|
||
| pipeline_str = ( | ||
| f'filesrc location="{cfg.video}" ! ' | ||
| f'decodebin3 ! ' | ||
| f'videoconvertscale ! ' | ||
| f'video/x-raw,format=BGRx,width=1280,height=720 ! ' | ||
|
oonyshch marked this conversation as resolved.
|
||
| f'queue ! ' | ||
| f'gvagenai ' | ||
| f'model-path="{cfg.model}" ' | ||
| f'device={cfg.device} ' | ||
| f'prompt-path="{prompt_path}" ' | ||
| f'generation-config="{generation_cfg}" ' | ||
| f'chunk-size=1 ' | ||
| f'frame-rate={cfg.frame_rate} ' | ||
| f'metrics=true ! ' | ||
| f'queue ! ' | ||
| f'gvametapublish file-format=json-lines ' | ||
| f'file-path="{output_json}" ! ' | ||
| f'queue ! ' | ||
| f'gvafpscounter ! ' | ||
| f'gvawatermark displ-cfg=text-scale=0.5 ! ' | ||
| f'videoconvert ! ' | ||
| f'vah264enc ! ' | ||
| f'h264parse ! ' | ||
| f'mp4mux ! ' | ||
| f'filesink location="{output_video}"' | ||
| ) | ||
|
|
||
| return pipeline_str, output_json, output_video, prompt_path | ||
|
|
||
|
|
||
| def run_pipeline_string(pipeline_str: str) -> int: | ||
| """Execute a GStreamer pipeline string and block until completion.""" | ||
| Gst.init(None) | ||
|
|
||
| try: | ||
| pipeline = Gst.parse_launch(pipeline_str) | ||
| except GLib.Error as error: | ||
|
oonyshch marked this conversation as resolved.
|
||
| print("Pipeline parse error:", str(error)) | ||
| return 1 | ||
|
|
||
| bus = pipeline.get_bus() | ||
| pipeline.set_state(Gst.State.PLAYING) | ||
|
|
||
| while True: | ||
| message = bus.timed_pop_filtered( | ||
| Gst.CLOCK_TIME_NONE, | ||
| Gst.MessageType.ERROR | Gst.MessageType.EOS, | ||
| ) | ||
|
|
||
| if message.type == Gst.MessageType.ERROR: | ||
| err, debug = message.parse_error() | ||
| print("ERROR:", err.message) | ||
| if debug: | ||
| print("DEBUG:", debug) | ||
| pipeline.set_state(Gst.State.NULL) | ||
| return 1 | ||
|
|
||
| if message.type == Gst.MessageType.EOS: | ||
| pipeline.set_state(Gst.State.NULL) | ||
| return 0 | ||
|
|
||
|
|
||
| def run_pipeline(cfg: PipelineConfig) -> int: | ||
| """Build and execute the pipeline from configuration.""" | ||
| pipeline_str, output_json, output_video, prompt_path = build_pipeline_string(cfg) | ||
|
|
||
| print("\nPipeline:\n") | ||
| print(pipeline_str) | ||
| print() | ||
|
|
||
| try: | ||
| result = run_pipeline_string(pipeline_str) | ||
| finally: | ||
| if prompt_path.exists(): | ||
| prompt_path.unlink() | ||
|
|
||
| if result == 0: | ||
| print(f"\nJSON output: {output_json}") | ||
| print(f"Video output: {output_video}") | ||
|
|
||
| return result | ||
|
|
||
|
|
||
| def parse_args() -> argparse.Namespace: | ||
| """Parse command line arguments.""" | ||
| parser = argparse.ArgumentParser( | ||
| description="DLStreamer VLM Alerts sample" | ||
| ) | ||
| parser.add_argument("video") | ||
| parser.add_argument("model") | ||
| parser.add_argument("question") | ||
| parser.add_argument("--device", default="GPU") | ||
| parser.add_argument("--max-tokens", type=int, default=20) | ||
| parser.add_argument("--frame-rate", type=float, default=1.0) | ||
|
oonyshch marked this conversation as resolved.
Outdated
|
||
|
|
||
| return parser.parse_args() | ||
|
|
||
|
|
||
| def main() -> int: | ||
| """Entry point.""" | ||
| args = parse_args() | ||
|
|
||
| video_path = ensure_video(args.video) | ||
| model_path = ensure_model(args.model) | ||
|
oonyshch marked this conversation as resolved.
Outdated
|
||
|
|
||
| config = PipelineConfig( | ||
| video=video_path, | ||
| model=model_path, | ||
| question=args.question, | ||
| device=args.device, | ||
| max_tokens=args.max_tokens, | ||
| frame_rate=args.frame_rate, | ||
| ) | ||
|
|
||
| return run_pipeline(config) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| sys.exit(main()) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.