You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: Add DuckDB HNSW vector index support for accelerated views
Previously, DuckDB HNSW vector indexes were only created for datasets.
Accelerated views now also support HNSW indexes when configured with
embedding columns and the DuckDB vector engine.
Source: spiceai/spiceai#10695
Copy file name to clipboardExpand all lines: website/docs/components/vectors/duckdb.md
+34-4Lines changed: 34 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,9 +6,9 @@ sidebar_position: 3
6
6
pagination_next: null
7
7
---
8
8
9
-
DuckDB can be used as a vector engine in Spice to store embeddings and execute vector similarity search using HNSW indexes via the [DuckDB VSS](https://duckdb.org/docs/extensions/vss) extension. This is useful when a dataset is already accelerated with DuckDB and a fully embedded, single-process vector store is preferred over an external service.
9
+
DuckDB can be used as a vector engine in Spice to store embeddings and execute vector similarity search using HNSW indexes via the [DuckDB VSS](https://duckdb.org/docs/extensions/vss) extension. This is useful when a dataset or view is already accelerated with DuckDB and a fully embedded, single-process vector store is preferred over an external service.
10
10
11
-
The DuckDB vector engine requires the dataset to be accelerated with the [DuckDB accelerator](../data-accelerators/duckdb). Spice computes embeddings on the configured columns during refresh and write, stores them in the DuckDB accelerator alongside the source data, and creates an HNSW index that is used to answer `vector_search` and `/v1/search` queries.
11
+
The DuckDB vector engine requires the dataset or view to be accelerated with the [DuckDB accelerator](../data-accelerators/duckdb). Spice computes embeddings on the configured columns during refresh and write, stores them in the DuckDB accelerator alongside the source data, and creates an HNSW index that is used to answer `vector_search` and `/v1/search` queries.
12
12
13
13
```yaml
14
14
datasets:
@@ -35,6 +35,36 @@ embeddings:
35
35
name: local_embedding_model
36
36
```
37
37
38
+
### View example
39
+
40
+
Accelerated views also support DuckDB HNSW vector indexes. Configure `columns[].embeddings` and `vectors` on the view:
41
+
42
+
```yaml
43
+
views:
44
+
- name: review_title_view
45
+
sql: select review_date, review_id, product_title, review_body from amazon_reviews
46
+
columns:
47
+
- name: product_title
48
+
embeddings:
49
+
- from: local_embedding_model
50
+
acceleration:
51
+
enabled: true
52
+
engine: duckdb
53
+
primary_key: review_id
54
+
mode: memory
55
+
vectors:
56
+
enabled: true
57
+
engine: duckdb
58
+
params:
59
+
duckdb_distance_metric: cosine
60
+
```
61
+
62
+
```sql
63
+
SELECT product_title
64
+
FROM vector_search(review_title_view, 'wireless headphones')
65
+
LIMIT 10;
66
+
```
67
+
38
68
## Parameters
39
69
40
70
| Parameter | Description | Default |
@@ -90,8 +120,8 @@ The DuckDB VSS extension is installed and loaded automatically by the runtime; n
90
120
91
121
:::warning[Limitations]
92
122
93
-
- A dataset or view must be accelerated with the DuckDB accelerator (`datasets[].acceleration.engine: duckdb`) for the DuckDB vector engine to be used.
94
-
- The dataset must have a resolvable primary key, either via the underlying schema or an explicit [`row_id`](../../reference/spicepod/datasets#columnsembeddingsrow_id).
123
+
- The dataset or view must be accelerated with the DuckDB accelerator (`acceleration.engine: duckdb`) for the DuckDB vector engine to be used.
124
+
- The dataset or view must have a resolvable primary key, either via the underlying schema or an explicit [`row_id`](../../reference/spicepod/datasets#columnsembeddingsrow_id).
95
125
- [Chunking](../../reference/spicepod/datasets#columns-embeddings-chunking) is not yet supported for the DuckDB vector engine.
96
126
- `partition_by`is not yet supported for the DuckDB vector engine.
97
127
- `spill_writes`is not supported for the DuckDB vector engine.
0 commit comments