Model Download Service

The Model Download Service is a microservice that enables downloading models from multiple hubs: Hugging Face, Ollama, GETI and Ultralytics. It also supports conversion to OpenVINO Model Server (OVMS) format for Hugging Face models. The service exposes a RESTful API for managing model downloads and conversions.

Features

Download models from Hugging Face, Ollama, Geti™ and Ultralytics model hubs
Convert Hugging Face models to OVMS format
Support for multiple model precisions (INT4,INT8, FP16, FP32)
Support for various device targets (CPU, GPU and NPU)
OpenVINO plugin supports NPU model conversion exclusively in INT4 precision.
Parallel download capability
Configurable model caching
REST API with OpenAPI documentation

Prerequisites

Docker and Docker Compose
Hugging Face API token (only required for gated Hugging Face models or conversion)
Sufficient disk space for model storage

Quick Start

Clone the Repository:

Clone the model-download repository:

# Clone the latest on mainline
  git clone https://github.com/open-edge-platform/edge-ai-libraries.git edge-ai-libraries
# Alternatively, Clone a specific release branch
  git clone https://github.com/open-edge-platform/edge-ai-libraries.git edge-ai-libraries -b <release-tag>

Navigate to the directory:
- Go to the model-download microservice directory
```
cd edge-ai-libraries/microservices/model-download
```

Configure the environment variables

Set the following environment variables:

export REGISTRY="intel/"
export TAG=latest
export HUGGINGFACEHUB_API_TOKEN=<your-huggingface-token>

To use the Geti™ plugin, set these variables:

export GETI_WORKSPACE_ID=<YOUR_GETI_WORKSPACE_ID>
export GETI_HOST=<GETI_HOST_ADDRESS>
export GETI_TOKEN=<GETI_ACCESS_TOKEN>
export GETI_SERVER_API_VERSION=v1
export GETI_SERVER_SSL_VERIFY=False  # Default is FALSE

Note: For Geti™ setup instructions, see the documentation here.

Launch the service

Use the run script to start the service and enable the plugins

source scripts/run_service.sh up --plugins all --model-path <host path>

NOTE: For public models, no token is needed. Set the Hugging Face token via the HUGGINGFACEHUB_API_TOKEN environment variable to download GATED models and for conversion to Openvino IR format

NOTE: Ensure the host path does not require privileged access for directory creation. It is recommended to use $PWD/host_path or a similar location within your working directory.

The run_service.sh script is a Docker Compose wrapper that builds and manages the model download service container with configurable plugins, model paths, and deployment options.

Options available with the script:

Usage:

  source scripts/run_service.sh [options] [action]

Actions:

    up                     Start the services (default)
    down                   Stop the services

Options:

Option	Description
`--build`	Build the Docker image before running
`--rebuild`	This flag instructs to ignore any existing cached images and rebuild them from scratch using the Dockerfile definitions
`--model-path <path>`	Set custom model path (default: `$HOME/models/`)
`--plugins <list>`	Comma-separated list of plugins to enable (e.g., `huggingface,ollama,openvino,ultralytics,geti`) or `all` to enable all available plugins
`--help`	Show this help message

Examples:

Start the service with default settings: source scripts/run_service.sh up
Stop the service: source scripts/run_service.sh down
Enable specific plugins: source scripts/run_service.sh up --plugins huggingface
Enable multiple plugins: source scripts/run_service.sh up --plugins huggingface,ollama,ultralytics,geti
Use a custom model storage location: source scripts/run_service.sh up --model-path /data/my-models
Production deployment with all plugins: source scripts/run_service.sh up --plugins all --model-path tmp/models
Display usage information: source scripts/run_service.sh --help

Access the service
- The service will be available at http://<host-ip>:8200/api/v1/docs, where you can view the Swagger documentation for all available APIs.

Verification

Ensure that the application is running by checking the Docker container status:
```
docker ps
```
Access the application dashboard and verify that it is functioning as expected.

Sample usage with CURL command

Download a Hugging Face model:

curl -X POST "http://<host-ip>:8200/api/v1/models/download?download_path=hf_model" \
  -H "Content-Type: application/json" \
  -d '{
    "models": [
      {
        "name": "microsoft/Phi-3.5-mini-instruct",
        "hub": "huggingface",
        "type": "llm"
      }
    ],
    "parallel_downloads": false
  }'

Download a model from Ollama:

curl -X POST "http://<host-ip>:8200/api/v1/models/download?download_path=ollama_model" \
  -H "Content-Type: application/json" \
  -d '{
    "models": [
      {
        "name": "tinyllama",
        "hub": "ollama",
        "type": "llm"
      }
    ],
    "parallel_downloads": false
  }'

Download a YOLO vision model from Ultralytics:

curl -X POST "http://<host-ip>:8200/api/v1/models/download?download_path=yolo_model" \
  -H "Content-Type: application/json" \
  -d '{
    "models": [
      {
        "name": "yolov8s",
        "hub": "ultralytics",
        "type": "vision"
      }
    ],
    "parallel_downloads": true
  }'

Note: YOLO vision models from Ultralytics will be downloaded and converted to OpenVINO IR format with FP32 and FP16 precision by default.

Download a Hugging Face model and convert to OpenVINO IR format:

curl -X POST "http://<host-ip>:8200/api/v1/models/download?download_path=ovms_model" \
  -H "Content-Type: application/json" \
  -d '{
    "models": [
      {
        "name": "BAAI/bge-reranker-base",
        "hub": "openvino",
        "type": "rerank",
        "is_ovms": true,
        "config": {
          "precision": "fp32",
          "device": "CPU",
          "cache_size": 10
        }
      }
    ],
    "parallel_downloads": false
  }'

Download models from GETI optimized with openvino:

curl -X POST 'http://<host-ip>:8200/api/v1/models/download?download_path=geti_folder' \
  -H "Content-Type: application/json" \
  -d '{
    "models": [
        {
            "name": "yolox-tiny",
            "hub": "geti",
            "revision": "1",
            "config":{
                "precision": "fp32"
            }
        }
    ],
    "parallel_downloads": true
  }'

Note: The default precision is FP16.

Query Parameter:

download_path (string): Specify a local file system path where the downloaded model should be saved. If not provided, the model will be saved to a default location.

Response: Sample Response (when a download request is started):

{
  "message": "Started processing 1 model(s)",
  "job_ids": [
    "5f0d4eba-c79c-4d02-97a6-43c3d0168ca0"
  ],
  "status": "processing"
}

Each model download request returns a job_id. To check the status of a download, use the following CURL command:

curl -X GET "http://<host-ip>:8200/api/v1/jobs/<job_id>"

Sample Response (when the job is completed):

{
  "id": "5f0d4eba-c79c-4d02-97a6-43c3d0168ca0",
  "operation_type": "download",
  "model_name": "yolov8s",
  "hub": "ultralytics",
  "output_dir": "/opt/models/ultra_folder",
  "status": "completed",
  "start_time": "2025-10-27T08:24:23.510870",
  "plugin_name": "ultralytics",
  "model_type":"vision",
  "plugin": "ultralytics",
  "completion_time": "2025-10-27T08:30:14.443898",
  "result": {
    "model_name": "yolov8s",
    "source": "ultralytics",
    "download_path": "model/download/path",
    "return_code": 0
  }
}

For more details checkout the API Spec

Configuration

The service can be configured through environment variables and Docker volumes:

Environment Variables:

HF_HUB_ENABLE_HF_TRANSFER: Enable Hugging Face transfer (default: 1)
HUGGINGFACEHUB_API_TOKEN: Hugging Face token (only required for gated models or conversion)

Volumes:

~/models:/app/models: Persist downloaded models

Troubleshooting

If you encounter any issues during the build or run process, check the Docker logs for errors:
```
docker logs <container-id>
```

Best Practices

Use parallel downloads with caution, as they can consume significant resources.
Configure cache sizes based on available memory.
Select model precision according to your performance requirements.
Use appropriate model types and configurations for OVMS conversion.

Running in Kubernetes

Refer to Deploy with Helm for the details. Ensure the prerequisites mentioned on this page are addressed before proceeding to deploy with Helm.

Advanced Setup Options

For alternative ways to set up the sample application, see:

How to Build from Source

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Download Service

Features

Prerequisites

Quick Start

Options available with the script:

Verification

Sample usage with CURL command

Configuration

Environment Variables:

Volumes:

Troubleshooting

Best Practices

Running in Kubernetes

Advanced Setup Options

FilesExpand file tree

get-started.md

Latest commit

History

get-started.md

File metadata and controls

Model Download Service

Features

Prerequisites

Quick Start

Options available with the script:

Verification

Sample usage with CURL command

Configuration

Environment Variables:

Volumes:

Troubleshooting

Best Practices

Running in Kubernetes

Advanced Setup Options