The Model Download Service is a microservice that enables downloading models from multiple hubs: Hugging Face, Ollama, GETI and Ultralytics. It also supports conversion to OpenVINO Model Server (OVMS) format for Hugging Face models. The service exposes a RESTful API for managing model downloads and conversions.
- Download models from Hugging Face, Ollama, Geti™ and Ultralytics model hubs
- Convert Hugging Face models to OVMS format
- Support for multiple model precisions (INT4,INT8, FP16, FP32)
- Support for various device targets (CPU, GPU and NPU)
- OpenVINO plugin supports NPU model conversion exclusively in INT4 precision.
- Parallel download capability
- Configurable model caching
- REST API with OpenAPI documentation
- Docker and Docker Compose
- Hugging Face API token (only required for gated Hugging Face models or conversion)
- Sufficient disk space for model storage
-
Clone the Repository:
- Clone the model-download repository:
# Clone the latest on mainline git clone https://github.com/open-edge-platform/edge-ai-libraries.git edge-ai-libraries # Alternatively, Clone a specific release branch git clone https://github.com/open-edge-platform/edge-ai-libraries.git edge-ai-libraries -b <release-tag>
- Clone the model-download repository:
-
Navigate to the directory:
- Go to the model-download microservice directory
cd edge-ai-libraries/microservices/model-download
- Go to the model-download microservice directory
-
Configure the environment variables
- Set the following environment variables:
export REGISTRY="intel/" export TAG=latest export HUGGINGFACEHUB_API_TOKEN=<your-huggingface-token>
- To use the Geti™ plugin, set these variables:
Note: For Geti™ setup instructions, see the documentation here.
export GETI_WORKSPACE_ID=<YOUR_GETI_WORKSPACE_ID> export GETI_HOST=<GETI_HOST_ADDRESS> export GETI_TOKEN=<GETI_ACCESS_TOKEN> export GETI_SERVER_API_VERSION=v1 export GETI_SERVER_SSL_VERIFY=False # Default is FALSE
- Set the following environment variables:
-
Launch the service
-
Use the run script to start the service and enable the plugins
source scripts/run_service.sh up --plugins all --model-path <host path>
NOTE: For public models, no token is needed. Set the Hugging Face token via the
HUGGINGFACEHUB_API_TOKENenvironment variable to download GATED models and for conversion to Openvino IR formatNOTE: Ensure the host path does not require privileged access for directory creation. It is recommended to use
$PWD/host_pathor a similar location within your working directory.The
run_service.shscript is a Docker Compose wrapper that builds and manages the model download service container with configurable plugins, model paths, and deployment options.Usage:
source scripts/run_service.sh [options] [action]Actions:
up Start the services (default) down Stop the servicesOptions:
Option Description --buildBuild the Docker image before running --rebuildThis flag instructs to ignore any existing cached images and rebuild them from scratch using the Dockerfile definitions --model-path <path>Set custom model path (default: $HOME/models/)--plugins <list>Comma-separated list of plugins to enable (e.g., huggingface,ollama,openvino,ultralytics,geti) orallto enable all available plugins--helpShow this help message Examples:
- Start the service with default settings:
source scripts/run_service.sh up - Stop the service:
source scripts/run_service.sh down - Enable specific plugins:
source scripts/run_service.sh up --plugins huggingface - Enable multiple plugins:
source scripts/run_service.sh up --plugins huggingface,ollama,ultralytics,geti - Use a custom model storage location:
source scripts/run_service.sh up --model-path /data/my-models - Production deployment with all plugins:
source scripts/run_service.sh up --plugins all --model-path tmp/models - Display usage information:
source scripts/run_service.sh --help
- Start the service with default settings:
-
-
Access the service
- The service will be available at
http://<host-ip>:8200/api/v1/docs, where you can view the Swagger documentation for all available APIs.
- The service will be available at
- Ensure that the application is running by checking the Docker container status:
docker ps
- Access the application dashboard and verify that it is functioning as expected.
Download a Hugging Face model:
curl -X POST "http://<host-ip>:8200/api/v1/models/download?download_path=hf_model" \
-H "Content-Type: application/json" \
-d '{
"models": [
{
"name": "microsoft/Phi-3.5-mini-instruct",
"hub": "huggingface",
"type": "llm"
}
],
"parallel_downloads": false
}'Download a model from Ollama:
curl -X POST "http://<host-ip>:8200/api/v1/models/download?download_path=ollama_model" \
-H "Content-Type: application/json" \
-d '{
"models": [
{
"name": "tinyllama",
"hub": "ollama",
"type": "llm"
}
],
"parallel_downloads": false
}'Download a YOLO vision model from Ultralytics:
curl -X POST "http://<host-ip>:8200/api/v1/models/download?download_path=yolo_model" \
-H "Content-Type: application/json" \
-d '{
"models": [
{
"name": "yolov8s",
"hub": "ultralytics",
"type": "vision"
}
],
"parallel_downloads": true
}'Note: YOLO vision models from Ultralytics will be downloaded and converted to OpenVINO IR format with FP32 and FP16 precision by default.
Download a Hugging Face model and convert to OpenVINO IR format:
curl -X POST "http://<host-ip>:8200/api/v1/models/download?download_path=ovms_model" \
-H "Content-Type: application/json" \
-d '{
"models": [
{
"name": "BAAI/bge-reranker-base",
"hub": "openvino",
"type": "rerank",
"is_ovms": true,
"config": {
"precision": "fp32",
"device": "CPU",
"cache_size": 10
}
}
],
"parallel_downloads": false
}'Download models from GETI optimized with openvino:
curl -X POST 'http://<host-ip>:8200/api/v1/models/download?download_path=geti_folder' \
-H "Content-Type: application/json" \
-d '{
"models": [
{
"name": "yolox-tiny",
"hub": "geti",
"revision": "1",
"config":{
"precision": "fp32"
}
}
],
"parallel_downloads": true
}'Note: The default precision is FP16.
Query Parameter:
download_path(string): Specify a local file system path where the downloaded model should be saved. If not provided, the model will be saved to a default location.
Response: Sample Response (when a download request is started):
{
"message": "Started processing 1 model(s)",
"job_ids": [
"5f0d4eba-c79c-4d02-97a6-43c3d0168ca0"
],
"status": "processing"
}Each model download request returns a job_id. To check the status of a download, use the following CURL command:
curl -X GET "http://<host-ip>:8200/api/v1/jobs/<job_id>"Sample Response (when the job is completed):
{
"id": "5f0d4eba-c79c-4d02-97a6-43c3d0168ca0",
"operation_type": "download",
"model_name": "yolov8s",
"hub": "ultralytics",
"output_dir": "/opt/models/ultra_folder",
"status": "completed",
"start_time": "2025-10-27T08:24:23.510870",
"plugin_name": "ultralytics",
"model_type":"vision",
"plugin": "ultralytics",
"completion_time": "2025-10-27T08:30:14.443898",
"result": {
"model_name": "yolov8s",
"source": "ultralytics",
"download_path": "model/download/path",
"return_code": 0
}
}- For more details checkout the API Spec
The service can be configured through environment variables and Docker volumes:
HF_HUB_ENABLE_HF_TRANSFER: Enable Hugging Face transfer (default: 1)HUGGINGFACEHUB_API_TOKEN: Hugging Face token (only required for gated models or conversion)
~/models:/app/models: Persist downloaded models
- If you encounter any issues during the build or run process, check the Docker logs for errors:
docker logs <container-id>
- Use parallel downloads with caution, as they can consume significant resources.
- Configure cache sizes based on available memory.
- Select model precision according to your performance requirements.
- Use appropriate model types and configurations for OVMS conversion.
Refer to Deploy with Helm for the details. Ensure the prerequisites mentioned on this page are addressed before proceeding to deploy with Helm.
For alternative ways to set up the sample application, see: