description |
---|
Instructions for using machine learning models hosted on HuggingFace with Spice. |
To use a model hosted on HuggingFace, specify the huggingface.co
path in the from
field and, when needed, the files to include.
The from
key takes the form of huggingface:model_path
. Below shows 2 common example of from
key configuration.
huggingface:username/modelname
: Implies the latest version ofmodelname
hosted byusername
.huggingface:huggingface.co/username/modelname:revision
: Specifies a particularrevision
ofmodelname
byusername
, including the optional domain.
The from
key follows the following regex format.
\A(huggingface:)(huggingface\.co\/)?(?<org>[\w\-]+)\/(?<model>[\w\-]+)(:(?<revision>[\w\d\-\.]+))?\z
The from
key consists of five components:
- Prefix: The value must start with
huggingface:
. - Domain (Optional): Optionally includes
huggingface.co/
immediately after the prefix. Currently no other Huggingface compatible services are supported. - Organization/User: The HuggingFace organization (
org
). - Model Name: After a
/
, the model name (model
). - Revision (Optional): A colon (
:
) followed by the git-like revision identifier (revision
).
The model name. This will be used as the model ID within Spice and Spice's endpoints (i.e. https://data.spiceai.io/v1/models
). This can be set to the same value as the model ID in the from
field.
Param | Description | Default |
---|---|---|
hf_token |
The Huggingface access token. | - |
model_type |
The architecture to load the model as. Supported values: mistral , gemma , mixtral , llama , phi2 , phi3 , qwen2 , gemma2 , starcoder2 , phi3.5moe , deepseekv2 , deepseekv3 |
- |
tools |
Which [tools] should be made available to the model. Set to auto to use all available tools. |
- |
system_prompt |
An additional system prompt used for all chat completions to this model. | - |
The specific file path for Huggingface model. For example, GGUF model formats require a specific file path, other varieties (e.g. .safetensors
) are inferred.
models:
- from: huggingface:huggingface.co/lmstudio-community/Qwen2.5-Coder-3B-Instruct-GGUF
name: sloth-gguf
files:
- path: Qwen2.5-Coder-3B-Instruct-Q3_K_L.gguf
Access tokens can be provided for Huggingface models in two ways:
- In the Huggingface token cache (i.e.
~/.cache/huggingface/token
). Default. - Via model params.
models:
- name: llama_3.2_1B
from: huggingface:huggingface.co/meta-llama/Llama-3.2-1B
params:
hf_token: ${ secrets:HF_TOKEN }
models:
- from: huggingface:huggingface.co/spiceai/darts:latest
name: hf_model
files:
- path: model.onnx
datasets:
- taxi_trips
models:
- from: huggingface:huggingface.co/microsoft/Phi-3.5-mini-instruct
name: phi
models:
- name: llama_3.2_1B
from: huggingface:huggingface.co/meta-llama/Llama-3.2-1B
params:
hf_token: ${ secrets:HF_TOKEN }
For more details on authentication, see access tokens.
{% hint style="warning" %}
Limitations
- The throughput, concurrency & latency of a locally hosted model will vary based on the underlying hardware and model size. Spice supports Apple metal and CUDA for accelerated inference.
- ML models currently only support ONNX file format.
{% endhint %}