Skip to content

Commit 95f2ca7

Browse files
claudespicelukekim
authored andcommitted
docs: Add DuckDB HNSW vector index support for accelerated views
Previously, DuckDB HNSW vector indexes were only created for datasets. Accelerated views now also support HNSW indexes when configured with embedding columns and the DuckDB vector engine. Source: spiceai/spiceai#10695
1 parent d779dc8 commit 95f2ca7

1 file changed

Lines changed: 34 additions & 4 deletions

File tree

website/docs/components/vectors/duckdb.md

Lines changed: 34 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ sidebar_position: 3
66
pagination_next: null
77
---
88

9-
DuckDB can be used as a vector engine in Spice to store embeddings and execute vector similarity search using HNSW indexes via the [DuckDB VSS](https://duckdb.org/docs/extensions/vss) extension. This is useful when a dataset is already accelerated with DuckDB and a fully embedded, single-process vector store is preferred over an external service.
9+
DuckDB can be used as a vector engine in Spice to store embeddings and execute vector similarity search using HNSW indexes via the [DuckDB VSS](https://duckdb.org/docs/extensions/vss) extension. This is useful when a dataset or view is already accelerated with DuckDB and a fully embedded, single-process vector store is preferred over an external service.
1010

11-
The DuckDB vector engine requires the dataset to be accelerated with the [DuckDB accelerator](../data-accelerators/duckdb). Spice computes embeddings on the configured columns during refresh and write, stores them in the DuckDB accelerator alongside the source data, and creates an HNSW index that is used to answer `vector_search` and `/v1/search` queries.
11+
The DuckDB vector engine requires the dataset or view to be accelerated with the [DuckDB accelerator](../data-accelerators/duckdb). Spice computes embeddings on the configured columns during refresh and write, stores them in the DuckDB accelerator alongside the source data, and creates an HNSW index that is used to answer `vector_search` and `/v1/search` queries.
1212

1313
```yaml
1414
datasets:
@@ -35,6 +35,36 @@ embeddings:
3535
name: local_embedding_model
3636
```
3737
38+
### View example
39+
40+
Accelerated views also support DuckDB HNSW vector indexes. Configure `columns[].embeddings` and `vectors` on the view:
41+
42+
```yaml
43+
views:
44+
- name: review_title_view
45+
sql: select review_date, review_id, product_title, review_body from amazon_reviews
46+
columns:
47+
- name: product_title
48+
embeddings:
49+
- from: local_embedding_model
50+
acceleration:
51+
enabled: true
52+
engine: duckdb
53+
primary_key: review_id
54+
mode: memory
55+
vectors:
56+
enabled: true
57+
engine: duckdb
58+
params:
59+
duckdb_distance_metric: cosine
60+
```
61+
62+
```sql
63+
SELECT product_title
64+
FROM vector_search(review_title_view, 'wireless headphones')
65+
LIMIT 10;
66+
```
67+
3868
## Parameters
3969

4070
| Parameter | Description | Default |
@@ -90,8 +120,8 @@ The DuckDB VSS extension is installed and loaded automatically by the runtime; n
90120

91121
:::warning[Limitations]
92122

93-
- A dataset or view must be accelerated with the DuckDB accelerator (`datasets[].acceleration.engine: duckdb`) for the DuckDB vector engine to be used.
94-
- The dataset must have a resolvable primary key, either via the underlying schema or an explicit [`row_id`](../../reference/spicepod/datasets#columnsembeddingsrow_id).
123+
- The dataset or view must be accelerated with the DuckDB accelerator (`acceleration.engine: duckdb`) for the DuckDB vector engine to be used.
124+
- The dataset or view must have a resolvable primary key, either via the underlying schema or an explicit [`row_id`](../../reference/spicepod/datasets#columnsembeddingsrow_id).
95125
- [Chunking](../../reference/spicepod/datasets#columns-embeddings-chunking) is not yet supported for the DuckDB vector engine.
96126
- `partition_by` is not yet supported for the DuckDB vector engine.
97127
- `spill_writes` is not supported for the DuckDB vector engine.

0 commit comments

Comments
 (0)