The VDMS DataPrep microservice builds and stores frame-level and text embeddings in VDMS while preserving the raw assets in MinIO. This guide explains how to launch the service, configure runtime options, and exercise the primary APIs.
VDMS DataPrep ships with Docker Compose manifests (docker/compose*.yaml) that provision MinIO, VDMS Vector DB, and the DataPrep container. Always source the accompanying setup scripts so the exported environment variables remain in your shell.
Before you begin, ensure the following:
- System Requirements: Verify that your system meets the minimum requirements.
- Docker Installed: Install Docker. For installation instructions, see Get Docker.
This guide assumes basic familiarity with Docker commands and terminal usage. If you are new to Docker, see Docker Documentation for an introduction.
The table below lists the core configuration knobs. setup.sh seeds defaults, but you can override them before sourcing the script.
| Variable | Required | Default | Purpose |
|---|---|---|---|
MINIO_ROOT_USER, MINIO_ROOT_PASSWORD |
✅ | (none) | Credentials used to bootstrap MinIO and authenticate API calls from DataPrep. |
MINIO_ENDPOINT |
✅ | minio-server:9000 |
Host:port string DataPrep uses to communicate with MinIO from inside the container. |
DEFAULT_BUCKET_NAME |
✅ | vdms-bucket (via setup.sh) |
Destination bucket for uploaded videos and generated manifests. Override with PM_MINIO_BUCKET when running alongside pipeline-manager. |
VDMS_VDB_HOST / VDMS_VDB_PORT |
✅ | vdms-vector-db / 55555 |
Connection information for VDMS Vector DB. |
DB_COLLECTION |
✅ | video-rag |
VDMS collection that stores embeddings and metadata. |
MULTIMODAL_EMBEDDING_MODEL_NAME |
✅ | (none) | Model identifier used by both SDK and API execution paths (for example CLIP/clip-vit-b-32 for multimodal or QwenText/qwen3-embedding-0.6b for text-only embeddings). |
EMBEDDING_PROCESSING_MODE |
✅ | sdk |
Selects optimized in-process execution (sdk) or HTTP-based execution (api). |
SDK_USE_OPENVINO |
Optional | true |
Enables OpenVINO acceleration in SDK mode. Set false to stay on PyTorch. |
VDMS_DATAPREP_DEVICE |
Optional | CPU |
Processing device for embeddings, and object detection (CPU or GPU). |
FRAME_INTERVAL |
Optional | 15 |
Extract every Nth frame during video processing. |
ENABLE_OBJECT_DETECTION |
Optional | true |
Toggles YOLOX-based crop extraction. |
DETECTION_CONFIDENCE |
Optional | 0.85 |
Minimum confidence threshold for detections. |
ROI_CONSOLIDATION_ENABLED |
Optional | false |
Enables ROI consolidation (merging overlapping detections). |
ROI_CONSOLIDATION_IOU_THRESHOLD |
Optional | 0.2 |
IoU threshold used to group overlapping boxes into a single ROI. |
ROI_CONSOLIDATION_CLASS_AWARE |
Optional | false |
Merge only boxes of the same class when true. |
ROI_CONSOLIDATION_CONTEXT_SCALE |
Optional | 0.2 |
Expands merged ROIs by this fraction of their width/height. |
OV_MODELS_DIR |
Optional | /app/ov_models |
Persistent mount that caches OpenVINO-optimized models. |
ALLOW_ORIGINS, ALLOW_METHODS, ALLOW_HEADERS |
Optional | * |
CORS configuration applied by FastAPI. |
Additional environment variables are available for high-throughput scenarios:
ENABLE_PARALLEL_PIPELINE(defaulttrue) — disable to force single-threaded embedding.MAX_PARALLEL_WORKERS— hard cap on SDK worker threads (auto-calculated when unset).OV_PERFORMANCE_MODE,OV_PERFORMANCE_HINT_NUM_REQUESTS,OV_NUM_STREAMS— forward performance hints to OpenVINO when running on CPU or GPU.
Export overrides before sourcing the setup script:
export MULTIMODAL_EMBEDDING_MODEL_NAME="CLIP/clip-vit-b-16"
export MINIO_ROOT_USER="minioadmin"
export MINIO_ROOT_PASSWORD="minioadmin"
export EMBEDDING_PROCESSING_MODE="sdk"
source ./setup.sh --nosetupTip: When you only need long-form text embeddings—such as the combined
--allmode in the video search and summarization sample—setEMBEDDING_MODEL_NAME="QwenText/qwen3-embedding-0.6b"before sourcingsetup.sh. The script forwards this value to the DataPrep container asMULTIMODAL_EMBEDDING_MODEL_NAME, enabling Qwen-backed text embeddings in SDK and API modes without any additional flags.
ROI consolidation merges overlapping detections into a single crop and optionally expands that crop for more context. This can reduce duplicate crops and improve embedding coverage when multiple detections overlap the same object.
Enable it via environment variable (recommended for quick toggles):
export ROI_CONSOLIDATION_ENABLED=trueOr configure it in src/config.yaml under object_detection.roi_consolidation:
-
enabled: Master switch for ROI consolidation logic. -
iou_threshold: IoU threshold used to cluster overlapping boxes. IoU is$\frac{\text{intersection area}}{\text{union area}}$ for two boxes; higher values mean only tighter overlaps merge, lower values merge more aggressively. -
class_aware: Whentrue, only boxes of the same class can be merged. Whenfalse, overlapping boxes across classes can merge (useful for mixed-class clusters). -
context_scale: Expand merged ROI by this fraction of its size. Higher values include more surrounding context; lower values keep crops tighter to the merged box.
Use source ./setup.sh --conf to print the resolved Docker Compose configuration with your overrides applied.
- Overview
- Architecture Overview
- Video Ingestion Flow - Detailed flow diagrams of the video processing pipeline
- API Reference
- System Requirements
Important: Do not run
docker builddirectly againstdocker/Dockerfile. The build depends on a wheel generated from the multimodal embedding serving microservice. Always execute./build.shin thevdmsdirectory first so the wheel is created underwheels/before building the container image.
The user has an option to either build the docker images or use prebuilt images as documented below.
Configure the registry: The VDMS DataPrep microservice uses the registry URL and tag to pull the required image.
```bash
export REGISTRY_URL=intel
export TAG=latest
```
-
Clone the repository and enter the project.
git clone https://github.com/open-edge-platform/edge-ai-libraries.git cd edge-ai-libraries/microservices/visual-data-preparation-for-retrieval/vdms -
Export required secrets and model selection.
export MINIO_ROOT_USER="minioadmin" export MINIO_ROOT_PASSWORD="minioadmin" export MULTIMODAL_EMBEDDING_MODEL_NAME="CLIP/clip-vit-b-32"
For text-only scenarios replace the last line with:
export MULTIMODAL_EMBEDDING_MODEL_NAME="QwenText/qwen3-embedding-0.6b"
-
Choose your execution mode.
- SDK mode (default): No external embedding service required. Run
source ./setup.shto spin up MinIO, VDMS, and DataPrep usingdocker/compose.yaml. - API mode: Requires the multimodal embedding serving container. Set
export EMBEDDING_PROCESSING_MODE=api,source ./setup-with-embedding.sh, then launch withdocker compose -f docker/compose-with-embedding.yaml up -d --build.
- SDK mode (default): No external embedding service required. Run
-
Confirm the stack is healthy.
docker ps --filter "name=vdms" --format "table {{.Names}}\t{{.Status}}"
-
Open the interactive docs. Navigate to
http://localhost:6007/docs(adjust if you changedVDMS_DATAPREP_HOST_PORT) to view the OpenAPI schema. -
Shut everything down when finished. Use
source ./setup.sh --down(ordocker compose ... downfor the API stack) to stop services.
The FastAPI application is mounted under /v1/dataprep.
curl http://localhost:6007/v1/dataprep/healthSDK mode responses include the preload status, model name, and device.
curl -X POST "http://localhost:6007/v1/dataprep/videos/upload" \
-H "Content-Type: multipart/form-data" \
-F "file=@/path/to/video.mp4" \
-F "frame_interval=10" \
-F "enable_object_detection=true" \
-F "tags=intersection" -F "tags=night"The service streams the asset to MinIO, extracts frames (and crops), generates embeddings, and persists metadata in VDMS. The JSON response reports the processing mode that was used.
curl -X POST "http://localhost:6007/v1/dataprep/videos/minio" \
-H "Content-Type: application/json" \
-d '{
"bucket_name": "vdms-bucket",
"video_id": "traffic_cam_2024_10_21",
"frame_interval": 12,
"enable_object_detection": true,
"tags": ["traffic", "daytime"]
}'To attach a human-authored summary to a video, use this command:
curl -X POST "http://localhost:6007/v1/dataprep/summary" \
-H "Content-Type: application/json" \
-d '{
"bucket_name": "vdms-bucket",
"video_id": "traffic_cam_2024_10_21",
"video_summary": "Vehicle stopped at intersection for 45 seconds",
"video_start_time": 12.5,
"video_end_time": 57.0,
"tags": ["summary", "manual"]
}'You can use the following commands to discover, download, and delete content:
# List processed videos (video_id + filenames)
curl "http://localhost:6007/v1/dataprep/videos"
# Download a processed clip (stream or attachment)
curl -L "http://localhost:6007/v1/dataprep/videos/download?video_id=traffic_cam_2024_10_21&video_name=clip_0003.mp4" -o clip_0003.mp4
# Delete everything under a video_id (omit video_name to remove one file)
curl -X DELETE "http://localhost:6007/v1/dataprep/videos?video_id=traffic_cam_2024_10_21"The telemetry endpoint captures per-request wall-clock timings, stage durations, throughput, and batch-level stats. Query the most recent entries directly from the DataPrep service (or via the pipeline-manager proxy) with:
curl --location 'http://localhost:6016/telemetry?limit=5'See the Telemetry Metrics reference for a complete breakdown of every field and how each value is calculated.
- Call
GET /v1/dataprep/health– expectstatus: ok, the active embedding mode, and the OpenVINO flag when SDK mode is selected. - Upload a small MP4 via
/videos/uploadand confirm:- The response payload reports
success. GET /v1/dataprep/videoslists the generatedvideo_idand manifests.- The MinIO console (
http://localhost:6011) shows the raw asset, thumbnails, and crops.
- The response payload reports
- Inspect VDMS (via
vdms_clior a custom client) to verify entries in thevideo-ragcollection.
- Startup fails with “model name must be provided”: Set
MULTIMODAL_EMBEDDING_MODEL_NAMEbefore launching Docker (required for both SDK and API modes). - Object detection disabled unexpectedly: Check logs for YOLOX download failures. Ensure the
YOLOX_MODELS_VOLUME_NAMEvolume exists and the host has outbound network access during first run. - API mode returns 502: Verify the multimodal embedding service is healthy at
MULTIMODAL_EMBEDDING_ENDPOINT(seedocker compose -f docker/compose-with-embedding.yaml ps). - Uploads rejected: Files larger than 500 MB are not accepted by the FastAPI upload endpoint. Stage the video directly in MinIO and use
/videos/minioinstead. - GPU acceleration inactive: Confirm
/dev/dri/*is mapped into the container,VDMS_DATAPREP_DEVICE=GPU, andSDK_USE_OPENVINO=true.