Skip to content
193 changes: 129 additions & 64 deletions .github/skills/dlstreamer-coding-agent/SKILL.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@

{{APP_DESCRIPTION}}

![{{APP_TITLE}}]({{APP_IMAGE}})
<!-- Optional: Include a screenshot from the output video. Omit this line if no image is available. -->
<!-- ![{{APP_TITLE}}]({{APP_IMAGE}}) -->

{{DETAILED_DESCRIPTION}}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,20 @@
import argparse
import os
import signal
import subprocess
import sys
import urllib.request
from pathlib import Path

import gi

gi.require_version("Gst", "1.0")

# Prevent GStreamer from forking gst-plugin-scanner (a C subprocess that cannot
# resolve Python symbols). Scanning in-process lets libgstpython.so find the
# Python runtime that is already loaded.
os.environ.setdefault("GST_REGISTRY_FORK", "no")

from gi.repository import Gst # pylint: disable=no-name-in-module, wrong-import-position

SCRIPT_DIR = Path(__file__).resolve().parent
Expand Down Expand Up @@ -62,9 +69,12 @@ def prepare_input(source: str) -> str:
local = VIDEOS_DIR / name
if not local.exists():
print(f"Downloading video: {source}")
req = urllib.request.Request(source, headers={"User-Agent": "Mozilla/5.0"})
with urllib.request.urlopen(req, timeout=120) as r: # noqa: S310
local.write_bytes(r.read())
subprocess.run([
"curl", "-L", "-o", str(local),
"-H", "Referer: https://www.pexels.com/",
"-H", "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36",
source,
], check=True, timeout=300)
print(f"Saved to: {local}")
return str(local)
if not os.path.isfile(source):
Expand All @@ -82,6 +92,17 @@ def find_model(pattern: str, label: str) -> str:
return str(hits[0])


def check_device(requested: str, label: str) -> str:
"""Check device availability with fallback chain: NPU → GPU → CPU."""
if requested == "NPU" and not os.path.exists("/dev/accel/accel0"):
print(f"Warning: NPU not available for {label}, falling back to GPU")
requested = "GPU"
if requested == "GPU" and not os.path.exists("/dev/dri/renderD128"):
print(f"Warning: GPU not available for {label}, falling back to CPU")
requested = "CPU"
return requested


def build_source(src: str) -> str:
"""Build GStreamer source element string for file or RTSP."""
if src.startswith("rtsp://"):
Expand Down Expand Up @@ -134,11 +155,8 @@ def main():
Path(args.output_video).parent.mkdir(parents=True, exist_ok=True)
Path(args.output_json).parent.mkdir(parents=True, exist_ok=True)

# GPU fallback
device = args.device
if device == "GPU" and not os.path.exists("/dev/dri/renderD128"):
print("Warning: GPU not available, falling back to CPU")
device = "CPU"
# Device fallback
device = check_device(args.device, "inference")

# Build and run pipeline
Gst.init(None)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Develop a vision AI application that implements an event-based smart video recording pipeline:
- Read input video from an RTSP camera, but allow also video file input
(use https://www.pexels.com/video/a-man-wearing-a-face-mask-walks-into-a-building-9492063/ for testing)
- Run an AI model to detect people in camera view
- Trigger recording of a video stream to a local file when a person is detected and stop recording when person is out of view
- Output a sequence of files: save-1, save-2, save-3, ... for each sequence when a person is visible

Optimize the application for Intel Core Ultra 3 processors. Save source code in smart_nvr directory, generate README.md with setup instructions. Validate the application works as expected and generate performance numbers (fps).
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Use DLStreamer Coding Agent to develop a Python application that implements license plate recognition pipeline:
- Read input video from a file (https://github.com/open-edge-platform/edge-ai-resources/raw/main/videos/ParkingVideo.mp4) but also allow remote IP cameras
- Run YOLOv11 (https://huggingface.co/morsetechlab/yolov11-license-plate-detection) for object detection and PaddleOCR (https://huggingface.co/PaddlePaddle/PP-OCRv5_server_rec) model for character recognition
- Output license plate text for each detected object as JSON file
- Annotate video stream and store it as an output video file

Generate vision AI processing pipeline optimized for Intel Core Ultra 3 processors. Save source code in license_plate_recognition directory, generate README.md with setup instructions. Follow instructions in README.md to run the application and check if it generates the expected output.
Original file line number Diff line number Diff line change
Expand Up @@ -67,14 +67,19 @@ def parse_args():

## Plugin Registration

The main app must add the plugins directory to `GST_PLUGIN_PATH` and verify the
Python plugin loader is available:
The main app must add the plugins directory to `GST_PLUGIN_PATH`, disable the forked
plugin scanner, and verify the Python plugin loader is available:

```python
plugins_dir = str(Path(__file__).resolve().parent / "plugins")
if plugins_dir not in os.environ.get("GST_PLUGIN_PATH", ""):
os.environ["GST_PLUGIN_PATH"] = f"{os.environ.get('GST_PLUGIN_PATH', '')}:{plugins_dir}"

# Prevent GStreamer from forking gst-plugin-scanner (a C subprocess that cannot
# resolve Python symbols). Scanning in-process lets libgstpython.so find the
# Python runtime that is already loaded.
os.environ.setdefault("GST_REGISTRY_FORK", "no")

Gst.init(None)

reg = Gst.Registry.get()
Expand Down Expand Up @@ -130,26 +135,34 @@ for mtd in rmeta:
success, tracking_id, _, _, _ = mtd.get_info()
```

## Graceful Shutdown
## Buffer Mutability in Custom Elements or Pads

For long-running pipelines (RTSP, live sources), handle Ctrl-C by sending EOS:
In GStreamer ≥ 1.26, `buffer.copy()` returns a **shallow copy** with an immutable
read-only data pointer. Use copy_deep()` when you need to modify buffer timestamps or data:

```python
import signal

def _sigint_handler(signum, frame):
pipeline.send_event(Gst.Event.new_eos())
# WRONG — raises NotWritableMiniObject in GStreamer ≥ 1.26
rec_buffer = buffer.copy()
rec_buffer.pts = new_pts # ❌ immutable

signal.signal(signal.SIGINT, _sigint_handler)
# CORRECT — deep copy creates a fully writable buffer
rec_buffer = buffer.copy_deep()
rec_buffer.pts = new_pts # ✓ writable
```

## GPU Availability Check
## Device Availability Check

Check for GPU availability before constructing the pipeline and fall back to CPU:
Check for GPU/NPU availability before constructing the pipeline. Use the fallback
chain NPU → GPU → CPU so the app works on any Intel system:

```python
det_dev = args.device
if det_dev == "GPU" and not os.path.exists("/dev/dri/renderD128"):
print("Warning: GPU not available, falling back to CPU")
det_dev = "CPU"
def check_device(requested, label):
"""Check device availability with fallback chain: NPU → GPU → CPU."""
if requested == "NPU" and not os.path.exists("/dev/accel/accel0"):
print(f"Warning: NPU not available for {label}, falling back to GPU")
requested = "GPU"
if requested == "GPU" and not os.path.exists("/dev/dri/renderD128"):
print(f"Warning: GPU not available for {label}, falling back to CPU")
requested = "CPU"
return requested
```
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,14 @@ class Controller:

Create a custom in-pipeline analytics element by subclassing `GstBase.BaseTransform`.
The element processes each buffer in `do_transform_ip` and can read/write metadata.
Use Custom Python elements instead of Probes if custom logic is complex and/or when it modifies buffers or metadata.
Use Custom Python elements instead of Probes if custom logic is complex and/or when it modifies buffers or metadata.

Do NOT create a BaseTransform element when its only purpose is to read existing detection/classification metadata
and pass a simple flag or filtered label to a downstream element. In that case, the downstream element should process metadata directly.

> **Rule of thumb:** A custom BaseTransform element is justified only when it implements
> **new derived analytics** (e.g. zone intersection, trajectory analysis, dwell-time
> calculation) that produces metadata not available from existing DLStreamer elements.

```python
import gi
Expand Down Expand Up @@ -191,12 +198,7 @@ __gstelementfactory__ = ("myanalytics_py", Gst.Rank.NONE, MyAnalytics)

**File location:** Place in `plugins/python/<element_name>.py`

**Registration:** Add the plugins directory to `GST_PLUGIN_PATH`:
```python
plugins_dir = str(Path(__file__).resolve().parent / "plugins")
os.environ["GST_PLUGIN_PATH"] = f"{os.environ.get('GST_PLUGIN_PATH', '')}:{plugins_dir}"
Gst.init(None)
```
**Registration:** See [Plugin Registration](./coding-conventions.md#plugin-registration) in the Coding Conventions Reference.

**Read for reference:** `samples/gstreamer/python/smart_nvr/plugins/python/gvaAnalytics.py`

Expand Down Expand Up @@ -241,6 +243,13 @@ __gstelementfactory__ = ("myrecorder_py", Gst.Rank.NONE, MyRecorder)

**Read for reference:** `samples/gstreamer/python/smart_nvr/plugins/python/gvaRecorder.py`

> **Decision shortcut — recording / conditional output:** If the user describes *event-triggered
> recording*, *conditional saving*, or *numbered output files*, go directly to this pattern.
> A `Gst.Bin` subclass with an internal `appsrc → encoder → mux → filesink` sub-pipeline is
> the only approach that can cleanly start/stop recordings and finalize MP4 containers (which
> require an EOS event to write the moov atom). Do **not** attempt this with pad probes,
> appsink callbacks, or tee+valve — those patterns cannot manage a secondary pipeline lifecycle.

---

## Pattern 8: Cross-Branch Signal Bridge
Expand Down Expand Up @@ -298,11 +307,28 @@ pipeline_str = (

Add Python functions to download assets (such as input video files) and AI models.
Always cache downloaded files locally, so only first application run requires network connection.
For AI model download, prioritize using existing download scripts and generate inline only if simple.
For AI model download, prioritize using existing download scripts and generate inline only if simple.

> **Video download method:** Use `subprocess` + `curl` (not `urllib.request`) for video
> downloads. Many video hosting sites (Pexels, Pixabay, etc.) block Python's `urllib`
> with HTTP 403 even with a custom `User-Agent`. `curl` with `-L` (follow redirects)
> and a `Referer` header works reliably.

> **Pexels URLs:** Users often provide the Pexels *page* URL
> (e.g. `https://www.pexels.com/video/<slug>-<ID>/`). The actual video file is at
> `https://videos.pexels.com/video-files/<ID>/<ID>-hd_<W>_<H>_<FPS>fps.mp4`
> but the resolution and FPS **vary per video** — do **not** guess them.
> You **must** scrape the Pexels page to discover the exact `.mp4` URL.
> Use `fetch_webpage` (or `curl -s`) on the page URL and search for
> `videos.pexels.com/video-files/` links. The Canva "Edit" links on the page
> embed the direct video URL as the `file-url=` query parameter, e.g.:
> `https://www.canva.com/...&file-url=https%3A%2F%2Fvideos.pexels.com%2Fvideo-files%2F9492063%2F9492063-hd_1920_1080_30fps.mp4&...`
> URL-decode the `file-url` value to get the direct download link.
> If scraping fails, ask the user for the direct video-file URL.

```python
from pathlib import Path
import urllib.request
import subprocess

VIDEOS_DIR = Path(__file__).resolve().parent / "videos"
MODELS_DIR = Path(__file__).resolve().parent / "models"
Expand All @@ -312,9 +338,14 @@ def download_video(url: str) -> Path:
filename = url.rstrip("/").split("/")[-1]
local = VIDEOS_DIR / filename
if not local.exists():
req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
with urllib.request.urlopen(req, timeout=60) as resp:
local.write_bytes(resp.read())
print(f"Downloading video: {url}")
subprocess.run([
"curl", "-L", "-o", str(local),
"-H", "Referer: https://www.pexels.com/",
"-H", "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36",
url,
], check=True, timeout=300)
print(f"Saved to: {local}")
return local.resolve()

def download_model(model_name: str) -> Path:
Expand Down Expand Up @@ -371,22 +402,3 @@ script that handles all model download and export. Users run it once before star
In addition, model export dependencies may clash with model inference dependencies which further
justifies splitting these two phases.

---

## Composing Patterns

When building a new app, identify which patterns apply and compose them:

| User wants... | Patterns to combine |
|---------------|---------------------|
| Simple detection + display | 1 + 4 (detect only) |
| Detection + classification + save | 1 + 4 + 11 |
| VLM alerting on video file | 1 + 9 + 10 + 11 |
| Detection with conditional recording | 1 + 4 + 5 + 7 |
| Custom analytics + chunked storage | 1 + 4 + 6 + 7 |
| Detection + VLM on selected frames | 1 + 4 + 5 + 6 + 8 + 9 + 11 |
| Multi-camera with per-camera AI | 12 + (any above per camera) |
| Detection + OCR (license plates, text) | 1 + 4 + 10 + 11 + 13 |
| Detection + custom model (non-OCR) | 1 + 4 + 6 + 11 |

---
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ with open(dict_path, "w") as f:

**Requirements:**
```
paddlepaddle
paddlepaddle # CPU-only variant; use paddlepaddle-gpu if GPU conversion is needed
paddle2onnx
```

Expand Down Expand Up @@ -257,42 +257,50 @@ Model-proc (model processing) JSON files are deprecated; please do not use them
| FP32 | (default) | Maximum accuracy | None |
| FP16 | `half=True` (Ultralytics), `--compress_to_fp16` (ovc) | GPU/NPU inference, reduced size | Negligible |
| INT8 | `int8=True` (Ultralytics) | GPU/NPU inference, reduced size | Negligible |
| INT8 | `--weight-format int8` (optimum-cli) | HuggingFace transformer models | Minor |
| INT4 | `--weight-format int4` (optimum-cli) | Large LLM/VLM models | Moderate, acceptable for VLMs |

> **Note:** Ultralytics INT8 export (`int8=True`) requires the `nncf` package. Add `nncf>=2.14.0`
> to `export_requirements.txt` to avoid auto-install delays during export.
> INT8 export triggers NNCF calibration which may take some time and may appear to hang. For iterative development, use `half=True` (FP16) first; switch to `int8=True` for production builds.

| INT8 | `--weight-format int8` (optimum-cli) | HuggingFace transformer models | Minor |
| INT4 | `--weight-format int4` (optimum-cli) | Large LLM/VLM models | Moderate, acceptable for VLMs |

> **Recommendation:** Use **INT8** (`int8=True`) for Ultralytics YOLO models.
Use INT8 for HuggingFace transformer classification models. Use INT4 for VLM models.
> **Recommendation:** Use **INT8** (`int8=True`) for Ultralytics YOLO models.
> Use INT8 for HuggingFace transformer classification models. Use INT4 for VLM models.

## Requirements

Prefer using `==` pins (e.g. `ultralytics==8.4.7`) in `export_requirements.txt`, over open ranges like `>=8.3.0`.
Open ranges pull untested releases that may change export behavior or break backward compatibility compatibility.
Copy the latest exact version used by DLStreamer samples.

> **CRITICAL — CPU-only PyTorch:** Always add `--extra-index-url https://download.pytorch.org/whl/cpu` as the
> **first line** of `export_requirements.txt` (before any package that depends on PyTorch).
> Without this, pip resolves the default CUDA-enabled PyTorch which downloads unnecessary dependencies and
> takes a very long time to install. Model export only needs CPU inference.

Typical `requirements.txt` entries by model source:

```
# IMPORTANT: CPU-only PyTorch — must appear before any torch-dependent package
--extra-index-url https://download.pytorch.org/whl/cpu

# Ultralytics YOLO
ultralytics==8.4.7
nncf>=2.14.0 # required for int8=True quantization
--extra-index-url https://download.pytorch.org/whl/cpu
nncf==2.14.0 # required for int8=True quantization

# HuggingFace transformers + OpenVINO export
optimum[openvino]
huggingface_hub

# PaddlePaddle models (OCR, etc.)
paddlepaddle
paddlepaddle # CPU-only variant (paddlepaddle-gpu is the GPU package)
paddle2onnx
openvino # for ovc model converter

# Open Model Zoo tools
openvino-dev

# Custom elements with pixel access
numpy
opencv-python # or opencv-python-headless

# Common
PyGObject>=3.50.0
PyGObject==3.50.0 # 3.50.2 depends on girepository-2.0 and breaks backward-compatibility
```
Loading
Loading