Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
generate_frame_embeddings.sh	generate_frame_embeddings.sh

Large Vision Model Sample (gst-launch command line)

This sample demonstrates the Large Vision Model (CLIP) pipeline constructed via gst-launch-1.0 command-line utility. It allows the extraction of image embeddings (CLS tokens) for each frame using the Visual Transformer.

How It Works

The sample utilizes GStreamer command-line tool gst-launch-1.0 which can build and run a GStreamer pipeline described in a string format. The string contains a list of GStreamer elements separated by an exclamation mark !, each element may have properties specified in the format property=value.

This sample builds a GStreamer pipeline of the following elements:

filesrc, urisourcebin, or v4l2src for input from file/URL/web-camera
decodebin3 for video decoding
videoconvert for converting video frames into different color formats
videoscale for scaling video frames
vapostproc for post-processing (used in GPU pipeline)
gvainference for running inference using the CLIP Vision Transformer model
gvametaconvert for converting metadata to JSON format
gvametapublish for publishing metadata to a file
gvafpscounter for measuring FPS (used in FPS mode)
fakesink for discarding the output

Model

The sample uses the clip-vit-large-patch14, clip-vit-base-patch16 or clip-vit-base-patch32 model. The necessary conversion to the OpenVINO™ format is performed by the download_public_models.sh script located in the samples directory.

Running

    export MODELS_PATH="$HOME"/models
    ../../../download_public_models.sh clip-vit-large-patch14
    ./generate_frame_embeddings.sh [INPUT] [DEVICE] [OUTPUT] [MODEL]

The sample takes four command-line optional parameters:

[INPUT] to specify the input source.
The input could be:
- local video file
- web camera device (e.g., /dev/video0)
- RTSP camera (URL starting with rtsp://) or other streaming source (e.g., URL starting with http://)
  If the parameter is not specified, the sample by default streams a video example from an HTTPS link (utilizing the urisourcebin element), so it requires an internet connection.
[DEVICE] to specify the device for inference.
You can choose either CPU or GPU.
[OUTPUT] to choose between file output mode and FPS throughput mode:
- json - output to a JSON file (default)
- fps - FPS only
[MODEL] to specify the model for inference:
- clip-vit-large-patch14 (default)
- clip-vit-base-patch16
- clip-vit-base-patch32

Sample Output

The sample:

prints the gst-launch-1.0 full command line into the console
starts the command and either publishes metadata to a file or prints out FPS if you set OUTPUT=fps

Example Usage

To run the sample with default values:

./generate_frame_embeddings.sh

To specify a source file, device, and output:

./generate_frame_embeddings.sh /path/to/video.mp4 GPU fps

To specify a URL, device, output and model:

./generate_frame_embeddings.sh https://example.com/video.mp4 CPU json clip-vit-large-patch14

To specify a video device, device, output and model:

./generate_frame_embeddings.sh /dev/video0 CPU fps clip-vit-base-patch-16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Large Vision Model Sample (gst-launch command line)

How It Works

Model

Running

Sample Output

See also

Example Usage

FilesExpand file tree

lvm

Directory actions

More options

Directory actions

More options

Latest commit

History

lvm

Folders and files

parent directory

README.md

Large Vision Model Sample (gst-launch command line)

How It Works

Model

Running

Sample Output

See also

Example Usage