Skip to content

Commit e1673bd

Browse files
JeadieScott Lyons
andauthored
Docs for embeddings (#554)
* docs for embeddings * Update local.md * Apply suggestions from code review Co-authored-by: Scott Lyons <scott@spice.ai> --------- Co-authored-by: Scott Lyons <scott@spice.ai>
1 parent e2b5e87 commit e1673bd

8 files changed

Lines changed: 120 additions & 2 deletions

File tree

spiceaidocs/docs/components/catalogs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: 'Catalog Connectors'
33
sidebar_label: 'Catalog Connectors'
44
description: ''
5-
sidebar_position: 5
5+
sidebar_position: 4
66
pagination_prev: null
77
pagination_next: null
88
---
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
---
2+
title: 'HuggingFace Text Embedding Models'
3+
sidebar_label: 'HuggingFace'
4+
sidebar_position: 2
5+
---
6+
7+
To run an embedding model from HuggingFace, specify the `huggingface` path in `from`. This will handle downloading and running the embedding model locally.
8+
```yaml
9+
embeddings:
10+
- from: huggingface:huggingface.co/sentence-transformers/all-MiniLM-L6-v2
11+
name: all_minilm_l6_v2
12+
```
13+
14+
Supported models include:
15+
- All models tagged as [text-embeddings-inference](https://huggingface.co/models?other=text-embeddings-inference) on Huggingface
16+
- Any Huggingface repository with the correct files to be loaded as a [local embedding model](/components/embeddings/local.md).
17+
18+
19+
With the same semantics as [language models](/components/models/huggingface#access-tokens), `spice` can run private HuggingFace embedding models:
20+
```yaml
21+
embeddings:
22+
- from: huggingface:huggingface.co/secret-company/awesome-embedding-model
23+
name: top_secret
24+
params:
25+
hf_token: ${ secrets:HF_TOKEN }
26+
```
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
---
2+
title: 'Embedding Models'
3+
sidebar_label: 'Embeddings'
4+
description: ''
5+
sidebar_position: 6
6+
pagination_prev: null
7+
pagination_next: null
8+
---
9+
10+
Embedding models are used to convert raw text into a numerical representation that can be used by machine learning models.
11+
12+
Spice supports running embedding models locally, or use remote services such as OpenAI, or [la Plateforme](https://console.mistral.ai/).
13+
14+
Embedding models are defined in the `spicepod.yaml` file as top-level components.
15+
16+
```yaml
17+
embeddings:
18+
- from: huggingface:huggingface.co/sentence-transformers/all-MiniLM-L6-v2
19+
name: all_minilm_l6_v2
20+
21+
- from: openai:text-embedding-3-large
22+
name: xl_embed
23+
params:
24+
openai_api_key: ${ secrets:SPICE_OPENAI_API_KEY }
25+
26+
- name: my_model
27+
from: file:model.safetensors
28+
files:
29+
- path: config.json
30+
- path: models/embed/tokenizer.json
31+
```
32+
33+
Embedding models can be used either by:
34+
- An OpenAI-compatible [endpoint](/api/http/embeddings.md)
35+
- By augmenting a dataset with column-level [embeddings](/reference/spicepod/datasets.md#embeddings), to provide vector-based [search functionality](/features/search/index.md#vector-search).
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
---
2+
title: 'Local Filesystem Embedding Models'
3+
sidebar_label: 'Local'
4+
sidebar_position: 3
5+
---
6+
7+
Embedding models can be run with files stored locally.
8+
9+
```yaml
10+
embeddings:
11+
- name: all_minilm_l6_v2
12+
from: file:model.safetensors
13+
files:
14+
- path: /Users/jeadie/Github/spiceai/models/embed/config.json
15+
- path: models/embed/tokenizer.json
16+
```
17+
18+
## Required Files
19+
- Model file, one of: `model.safetensors`, `pytorch_model.bin`.
20+
- A tokenizer file with the filename `tokenizer.json`.
21+
- A config file with the filename `config.json`.
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
title: 'OpenAI (or Compatible) Embedding Models'
3+
sidebar_label: 'OpenAI'
4+
sidebar_position: 1
5+
---
6+
7+
To use a hosted OpenAI (or compatible) embedding model, specify the `openai` path in `from`.
8+
9+
For a specific model, include it as the model ID in `from` (see example below). Defaults to `"text-embedding-3-small"`.
10+
These parameters are specific to OpenAI models:
11+
12+
| Parameter | Description | Default |
13+
| ----- | ----------- | ------- |
14+
| `openai_api_key` | The OpenAI API key. | - |
15+
| `openai_org_id` | The OpenAI organization id. | - |
16+
| `openai_project_id` | The OpenAI project id. | - |
17+
| `endpoint` | The OpenAI API base endpoint. | `https://api.openai.com/v1` |
18+
19+
20+
Example:
21+
22+
```yaml
23+
models:
24+
- from: openai:text-embedding-3-large
25+
name: xl_embed
26+
params:
27+
openai_api_key: ${ secrets:SPICE_OPENAI_API_KEY }
28+
29+
- name: mistral
30+
from: openai:mistral-embed
31+
params:
32+
endpoint: https://api.mistral.ai/v1
33+
api_key: ${ secrets:SPICE_MISTRAL_API_KEY }
34+
```

spiceaidocs/docs/components/models/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
title: 'AI/ML Models'
33
sidebar_label: 'AI/ML Models'
44
description: ''
5+
sidebar_position: 5
56
---
67

78
Spice supports traditional machine learning (ML) models and language models (LLMs).

spiceaidocs/docs/components/secret-stores/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: 'Secret Stores'
33
sidebar_label: 'Secret Stores'
44
description: ''
5-
sidebar_position: 4
5+
sidebar_position: 3
66
pagination_prev: null
77
pagination_next: null
88
---

spiceaidocs/docs/components/views/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
title: 'Views'
33
sidebar_label: 'Views'
44
description: 'Documentation for defining Views'
5+
sidebar_position: 7
56
---
67

78
Views in Spice are virtual tables defined by SQL queries. They simplify complex queries and support reuse across applications.

0 commit comments

Comments
 (0)