You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/docs/components/vectors/index.md
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,8 @@ pagination_prev: null
7
7
pagination_next: null
8
8
---
9
9
10
+
> 🎓 Learn how it works with the [Amazon S3 Vectors with Spice](https://spiceai.org/blog/2025/amazon-s3-vectors-with-spice) engineering blog post.
11
+
10
12
Data sourced by Data Connectors with vector embedding columns can be indexed and efficiently searched using a vector engine.
11
13
12
14
A vector engine will store all vector embeddings associated to columns in a dataset, provide efficient vector search operations and avoid unnecessary recomputation of embeddings.
@@ -18,16 +20,15 @@ datasets:
18
20
- name: dataset_with_embeddings
19
21
vectors:
20
22
enabled: true
21
-
22
23
```
23
24
24
25
For the complete reference specification see [datasets](/docs/reference/spicepod/datasets.md).
|`s3_vectors_arn`| The S3 vectors index to use. Incompatible with `s3_vectors_bucket` and `s3_vectors_index`. |`arn:aws:s3vectors:123456654321/bucket/a-bucket/index/index-of-important-embeddings`|
36
-
|`s3_vectors_aws_access_key_id`| The access key ID for the S3 vectors index | - |
37
-
|`s3_vectors_aws_region`| The AWS region for the S3 vectors index. |`us-east-1`|
38
-
|`s3_vectors_aws_secret_access_key`| The secret access key for the S3 vectors index | - |
39
-
|`s3_vectors_aws_session_token`| Session token for the S3 vectors index. | - |
40
-
| `s3_vectors_bucket` | The S3 vectors bucket to use. If `s3_vectors_index` is not specified, an index will be created based on the underlying embedding column. Incompatible with `s3_vectors_arn` | `a-bucket`
41
-
|`s3_vectors_endpoint`| The endpoint for the S3 vectors index |`s3vectors.us-east-2.api.aws`|
42
-
|`s3_vectors_index`| The name of the s3 vectors index to use or create. Incompatible with `s3_vectors_arn`. |`index-of-important-embeddings`|
|`s3_vectors_arn`| The S3 vectors index to use. Incompatible with `s3_vectors_bucket` and `s3_vectors_index`. |`arn:aws:s3vectors:123456654321/bucket/a-bucket/index/index-of-important-embeddings`|
37
+
|`s3_vectors_aws_access_key_id`| The access key ID for the S3 vectors index | - |
38
+
|`s3_vectors_aws_region`| The AWS region for the S3 vectors index. |`us-east-1`|
39
+
|`s3_vectors_aws_secret_access_key`| The secret access key for the S3 vectors index | - |
40
+
|`s3_vectors_aws_session_token`| Session token for the S3 vectors index. | - |
41
+
|`s3_vectors_bucket`| The S3 vectors bucket to use. If `s3_vectors_index` is not specified, an index will be created based on the underlying embedding column. Incompatible with `s3_vectors_arn`|`a-bucket`|
42
+
|`s3_vectors_index`| The name of the s3 vectors index to use or create. Incompatible with `s3_vectors_arn`. |`index-of-important-embeddings`|
44
43
45
44
:::warning[Limitations]
46
-
-`s3_vectors_index` and `s3_vectors_arn` specify a single index for the dataset and therefore should not be used with a dataset containing more than one embedding column.
47
-
:::
48
45
49
-
<!-- ## Cookbook
46
+
-`s3_vectors_index` and `s3_vectors_arn` specify a single index for the dataset and therefore should not be used with a dataset containing more than one embedding column.
47
+
:::
48
+
49
+
## Cookbook
50
50
51
51
- A cookbook recipe to configure a dataset with an S3 vectors engine in Spice. [S3 Vectors engine](https://github.com/spiceai/cookbook/tree/trunk/vectors/s3#readme)
Copy file name to clipboardExpand all lines: website/docs/features/search/index.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,6 +13,8 @@ tags:
13
13
14
14
import DocCardList from '@theme/DocCardList';
15
15
16
+
> 🎓 Learn how it works with the [Amazon S3 Vectors with Spice](https://spiceai.org/blog/2025/amazon-s3-vectors-with-spice) engineering blog post.
17
+
16
18
Spice provides advanced search capabilities that go beyond standard SQL queries, offering both traditional SQL search patterns, semantic (vector-based) search, and full text search functionality.
17
19
18
20
## Vector Search
@@ -51,9 +53,11 @@ WHERE
51
53
```
52
54
53
55
### SQL UDTFs
56
+
54
57
Similar to the above mentioned [vector search](#vector-search) and [full text search](#full-text-search), Spice supports SQL equivalent user-defined table functions (UDTF).
55
58
56
59
To perform a vector search
60
+
57
61
```sql
58
62
SELECT id, extra_column, score
59
63
FROM vector_search(my_table, 'search query')
@@ -65,6 +69,7 @@ LIMIT 5
65
69
For an entire specification of the `vector_search` UDTF, see [Vector-Based Search](/docs/features/search/vector-search#sql-udtf).
66
70
67
71
Similarly, for full text search use the `text_search` UDTF
Copy file name to clipboardExpand all lines: website/docs/features/search/vector-search.md
+13-8Lines changed: 13 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,8 @@ tags:
9
9
- embeddings
10
10
---
11
11
12
+
> 🎓 Learn how it works with the [Amazon S3 Vectors with Spice](https://spiceai.org/blog/2025/amazon-s3-vectors-with-spice) engineering blog post.
13
+
12
14
Spice provides advanced vector-based search capabilities, enabling more nuanced and intelligent searches. The runtime supports both:
13
15
14
16
1. Local embedding models, e.g. [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2).
@@ -112,10 +114,12 @@ Response:
112
114
### Pre-Existing Embeddings
113
115
114
116
Datasets that already include embeddings can utilize the same functionalities (e.g., vector search) as those augmented with embeddings using Spice. To ensure compatibility, the dataset must:
115
-
1. Adhere to naming and type constraints for the underlying and embeddings columns.
116
-
2. Define the embedding model to use for the column in the `spicepod.yaml` file. This isn't used to compute embedding on data in the table, but to embed the query text for similarity search operations. Like above, this can be done in the dataset component:
117
-
```yaml
118
-
datasets:
117
+
118
+
1. Adhere to naming and type constraints for the underlying and embeddings columns.
119
+
2. Define the embedding model to use for the column in the `spicepod.yaml` file. This isn't used to compute embedding on data in the table, but to embed the query text for similarity search operations. Like above, this can be done in the dataset component:
120
+
121
+
```yaml
122
+
datasets:
119
123
- from: github:github.com/spiceai/spiceai/issues
120
124
name: spiceai.issues
121
125
acceleration:
@@ -124,19 +128,17 @@ Datasets that already include embeddings can utilize the same functionalities (e
124
128
- name: body
125
129
embeddings:
126
130
- from: local_embedding_model # defined in `embeddings` section
127
-
```
131
+
```
128
132
129
133
#### Constraints
130
-
1.**Underlying Column Presence:**
131
134
135
+
1.**Underlying Column Presence:**
132
136
- The underlying column must exist in the table, and be of `string`[Arrow data type](../../reference/datatypes/accelerators.md) .
133
137
134
138
2.**Embeddings Column Naming Convention:**
135
-
136
139
- For each underlying column, the corresponding embeddings column must be named as `<column_name>_embedding`. For example, a `customer_reviews` table with a `review` column must have a `review_embedding` column.
137
140
138
141
3.**Embeddings Column Data Type:**
139
-
140
142
- The embeddings column must have the following [Arrow data type](../../reference/datatypes/accelerators.md) when loaded into Spice:
141
143
1.`FixedSizeList[Float32 or Float64, N]`, where `N` is the dimension (size) of the embedding vector. `FixedSizeList` is used for efficient storage and processing of fixed-size vectors.
142
144
2. If the column is [**chunked**](/docs/components/embeddings#chunking), use `List[FixedSizeList[Float32 or Float64, N]]`.
@@ -212,7 +214,9 @@ sql> describe sales;
212
214
```
213
215
214
216
### SQL UDTF
217
+
215
218
The embedding index can also be used to perform search in SQL, via a user-defined table function(UDTF).
0 commit comments