Skip to content

Commit 27785a3

Browse files
lukekimJeadie
andauthored
Amazon S3 Vectors with Spice Blog Post (#1087)
* Amazon S3 Vectors with Spice Blog Post * Add links * Fix author * Fix backslashes * Add tags * More tweaks * Fix links * Fix spacing * edits; remove 's3_vectors_endpoint' too * remove <<<< * Update website/blog/2025/amazon-s3-vectors-with-spice.mdx --------- Co-authored-by: jeadie <jack@spice.ai>
1 parent 712ce53 commit 27785a3

8 files changed

Lines changed: 461 additions & 33 deletions

File tree

website/blog/2025/amazon-s3-vectors-with-spice.mdx

Lines changed: 389 additions & 0 deletions
Large diffs are not rendered by default.

website/blog/tags.yml

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ datafusion:
8181
deepseek:
8282
label: 'deepseek'
8383
permalink: '/deepseek'
84-
description: 'Deepseek AI firm related topics and usage'
84+
description: 'Deepseek AI firm related topics and usage'
8585
delta:
8686
label: 'delta'
8787
permalink: '/delta'
@@ -110,6 +110,10 @@ embeddings:
110110
label: 'embeddings'
111111
permalink: '/embeddings'
112112
description: 'Embeddings related topics and usage'
113+
engineering:
114+
label: 'engineering'
115+
permalink: '/engineering'
116+
description: 'Spice AI engineering team posts'
113117
evaluation:
114118
label: 'evaluation'
115119
permalink: '/evaluations'
@@ -146,6 +150,10 @@ huggingface:
146150
label: 'huggingface'
147151
permalink: '/huggingface'
148152
description: 'Hugging Face open-source machine learning tool related topics and usage'
153+
hybrid-search:
154+
label: 'hybrid-search'
155+
permalink: '/hybrid-search'
156+
description: 'Hybrid search combining multiple search methodologies'
149157
iceberg:
150158
label: 'iceberg'
151159
permalink: '/iceberg'
@@ -222,6 +230,10 @@ python:
222230
label: 'python'
223231
permalink: '/python'
224232
description: 'Python Software related topics and usage'
233+
rag:
234+
label: 'rag'
235+
permalink: '/rag'
236+
description: 'Retrieval-Augmented Generation (RAG) related topics and usage'
225237
release:
226238
label: 'release'
227239
permalink: '/releases'
@@ -238,10 +250,18 @@ sdk:
238250
label: 'sdk'
239251
permalink: '/sdk'
240252
description: 'Software Development Kit related topics and usage'
253+
search:
254+
label: 'search'
255+
permalink: '/search'
256+
description: 'Search functionality and implementations'
241257
security:
242258
label: 'security'
243259
permalink: '/security'
244260
description: 'Security practices and tools'
261+
semantic-search:
262+
label: 'semantic-search'
263+
permalink: '/semantic-search'
264+
description: 'Semantic search methods and implementations'
245265
snowflake:
246266
label: 'snowflake'
247267
permalink: '/snowflake'
@@ -250,6 +270,10 @@ spice.js:
250270
label: 'spice.js'
251271
permalink: '/spice.js'
252272
description: 'spice.js (Node.js SDK) related topics and usage'
273+
spiceai:
274+
label: 'spiceai'
275+
permalink: '/spiceai'
276+
description: 'Spice.ai platform related topics and usage'
253277
spicepod:
254278
label: 'spicepod'
255279
permalink: '/spicepod'
@@ -270,6 +294,10 @@ tableau:
270294
label: 'tableau'
271295
permalink: '/tableau'
272296
description: 'Tableau Connector related topics and usage'
297+
vector-search:
298+
label: 'vector-search'
299+
permalink: '/vector-search'
300+
description: 'Vector search techniques and applications'
273301
vectors:
274302
label: 'vectors'
275303
permalink: '/vectors'

website/docs/components/vectors/index.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ pagination_prev: null
77
pagination_next: null
88
---
99

10+
> 🎓 Learn how it works with the [Amazon S3 Vectors with Spice](https://spiceai.org/blog/2025/amazon-s3-vectors-with-spice) engineering blog post.
11+
1012
Data sourced by Data Connectors with vector embedding columns can be indexed and efficiently searched using a vector engine.
1113

1214
A vector engine will store all vector embeddings associated to columns in a dataset, provide efficient vector search operations and avoid unnecessary recomputation of embeddings.
@@ -18,16 +20,15 @@ datasets:
1820
- name: dataset_with_embeddings
1921
vectors:
2022
enabled: true
21-
2223
```
2324
2425
For the complete reference specification see [datasets](/docs/reference/spicepod/datasets.md).
2526
2627
Supported Vector engines:
2728
28-
| Name | Description |
29-
| ------------ | ------------------------------- |
30-
| [`s3_vectors`][s3vectors] | AWS S3 vectors |
29+
| Name | Description |
30+
| ------------------------- | -------------- |
31+
| [`s3_vectors`][s3vectors] | AWS S3 vectors |
3132

3233
[s3vectors]: /docs/components/vectors/s3_vectors.md
3334

website/docs/components/vectors/s3_vectors.md

Lines changed: 18 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,9 @@ sidebar_position: 1
66
pagination_next: null
77
---
88

9-
To use S3 Vectors as a Vector Engine, specify `s3_vectors` as the `engine`, and configure the associated location and AWS credentials.
9+
> 🎓 Learn how it works with the [Amazon S3 Vectors with Spice](https://spiceai.org/blog/2025/amazon-s3-vectors-with-spice) engineering blog post.
10+
11+
To use Amazon S3 Vectors as a Vector Engine, specify `s3_vectors` as the `engine`, and configure the associated location and AWS credentials.
1012

1113
```yaml
1214
datasets:
@@ -18,35 +20,32 @@ datasets:
1820
params:
1921
s3_vectors_bucket: my-s3-vector-bucket
2022
columns:
21-
- name: "body"
23+
- name: 'body'
2224
embeddings:
2325
- from: bedrock_titan
2426

2527
embeddings:
2628
- name: bedrock_titan
27-
# ... Define an embedding model to use.
28-
29+
# ... Define an embedding model to use.
2930
```
3031

3132
## Parameters
3233

33-
| Parameter | Description | Example Value |
34-
| ---------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- | --- |
35-
| `s3_vectors_arn` | The S3 vectors index to use. Incompatible with `s3_vectors_bucket` and `s3_vectors_index`. | `arn:aws:s3vectors:123456654321/bucket/a-bucket/index/index-of-important-embeddings` |
36-
| `s3_vectors_aws_access_key_id` | The access key ID for the S3 vectors index | - |
37-
| `s3_vectors_aws_region` | The AWS region for the S3 vectors index. | `us-east-1` |
38-
| `s3_vectors_aws_secret_access_key` | The secret access key for the S3 vectors index | - |
39-
| `s3_vectors_aws_session_token` | Session token for the S3 vectors index. | - |
40-
| `s3_vectors_bucket` | The S3 vectors bucket to use. If `s3_vectors_index` is not specified, an index will be created based on the underlying embedding column. Incompatible with `s3_vectors_arn` | `a-bucket`
41-
| `s3_vectors_endpoint` | The endpoint for the S3 vectors index | `s3vectors.us-east-2.api.aws` |
42-
| `s3_vectors_index` | The name of the s3 vectors index to use or create. Incompatible with `s3_vectors_arn`. | `index-of-important-embeddings` |
43-
34+
| Parameter | Description | Example Value |
35+
| ---------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
36+
| `s3_vectors_arn` | The S3 vectors index to use. Incompatible with `s3_vectors_bucket` and `s3_vectors_index`. | `arn:aws:s3vectors:123456654321/bucket/a-bucket/index/index-of-important-embeddings` |
37+
| `s3_vectors_aws_access_key_id` | The access key ID for the S3 vectors index | - |
38+
| `s3_vectors_aws_region` | The AWS region for the S3 vectors index. | `us-east-1` |
39+
| `s3_vectors_aws_secret_access_key` | The secret access key for the S3 vectors index | - |
40+
| `s3_vectors_aws_session_token` | Session token for the S3 vectors index. | - |
41+
| `s3_vectors_bucket` | The S3 vectors bucket to use. If `s3_vectors_index` is not specified, an index will be created based on the underlying embedding column. Incompatible with `s3_vectors_arn` | `a-bucket` |
42+
| `s3_vectors_index` | The name of the s3 vectors index to use or create. Incompatible with `s3_vectors_arn`. | `index-of-important-embeddings` |
4443

4544
:::warning[Limitations]
46-
- `s3_vectors_index` and `s3_vectors_arn` specify a single index for the dataset and therefore should not be used with a dataset containing more than one embedding column.
47-
:::
4845

49-
<!-- ## Cookbook
46+
- `s3_vectors_index` and `s3_vectors_arn` specify a single index for the dataset and therefore should not be used with a dataset containing more than one embedding column.
47+
:::
48+
49+
## Cookbook
5050

5151
- A cookbook recipe to configure a dataset with an S3 vectors engine in Spice. [S3 Vectors engine](https://github.com/spiceai/cookbook/tree/trunk/vectors/s3#readme)
52-
-->

website/docs/features/search/index.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ tags:
1313

1414
import DocCardList from '@theme/DocCardList';
1515

16+
> 🎓 Learn how it works with the [Amazon S3 Vectors with Spice](https://spiceai.org/blog/2025/amazon-s3-vectors-with-spice) engineering blog post.
17+
1618
Spice provides advanced search capabilities that go beyond standard SQL queries, offering both traditional SQL search patterns, semantic (vector-based) search, and full text search functionality.
1719

1820
## Vector Search
@@ -51,9 +53,11 @@ WHERE
5153
```
5254

5355
### SQL UDTFs
56+
5457
Similar to the above mentioned [vector search](#vector-search) and [full text search](#full-text-search), Spice supports SQL equivalent user-defined table functions (UDTF).
5558

5659
To perform a vector search
60+
5761
```sql
5862
SELECT id, extra_column, score
5963
FROM vector_search(my_table, 'search query')
@@ -65,6 +69,7 @@ LIMIT 5
6569
For an entire specification of the `vector_search` UDTF, see [Vector-Based Search](/docs/features/search/vector-search#sql-udtf).
6670

6771
Similarly, for full text search use the `text_search` UDTF
72+
6873
```sql
6974
SELECT id, extra_column, score
7075
FROM text_search(my_table, 'search terms')

website/docs/features/search/vector-search.md

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ tags:
99
- embeddings
1010
---
1111

12+
> 🎓 Learn how it works with the [Amazon S3 Vectors with Spice](https://spiceai.org/blog/2025/amazon-s3-vectors-with-spice) engineering blog post.
13+
1214
Spice provides advanced vector-based search capabilities, enabling more nuanced and intelligent searches. The runtime supports both:
1315

1416
1. Local embedding models, e.g. [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2).
@@ -112,10 +114,12 @@ Response:
112114
### Pre-Existing Embeddings
113115

114116
Datasets that already include embeddings can utilize the same functionalities (e.g., vector search) as those augmented with embeddings using Spice. To ensure compatibility, the dataset must:
115-
1. Adhere to naming and type constraints for the underlying and embeddings columns.
116-
2. Define the embedding model to use for the column in the `spicepod.yaml` file. This isn't used to compute embedding on data in the table, but to embed the query text for similarity search operations. Like above, this can be done in the dataset component:
117-
```yaml
118-
datasets:
117+
118+
1. Adhere to naming and type constraints for the underlying and embeddings columns.
119+
2. Define the embedding model to use for the column in the `spicepod.yaml` file. This isn't used to compute embedding on data in the table, but to embed the query text for similarity search operations. Like above, this can be done in the dataset component:
120+
121+
```yaml
122+
datasets:
119123
- from: github:github.com/spiceai/spiceai/issues
120124
name: spiceai.issues
121125
acceleration:
@@ -124,19 +128,17 @@ Datasets that already include embeddings can utilize the same functionalities (e
124128
- name: body
125129
embeddings:
126130
- from: local_embedding_model # defined in `embeddings` section
127-
```
131+
```
128132

129133
#### Constraints
130-
1. **Underlying Column Presence:**
131134

135+
1. **Underlying Column Presence:**
132136
- The underlying column must exist in the table, and be of `string` [Arrow data type](../../reference/datatypes/accelerators.md) .
133137

134138
2. **Embeddings Column Naming Convention:**
135-
136139
- For each underlying column, the corresponding embeddings column must be named as `<column_name>_embedding`. For example, a `customer_reviews` table with a `review` column must have a `review_embedding` column.
137140

138141
3. **Embeddings Column Data Type:**
139-
140142
- The embeddings column must have the following [Arrow data type](../../reference/datatypes/accelerators.md) when loaded into Spice:
141143
1. `FixedSizeList[Float32 or Float64, N]`, where `N` is the dimension (size) of the embedding vector. `FixedSizeList` is used for efficient storage and processing of fixed-size vectors.
142144
2. If the column is [**chunked**](/docs/components/embeddings#chunking), use `List[FixedSizeList[Float32 or Float64, N]]`.
@@ -212,7 +214,9 @@ sql> describe sales;
212214
```
213215
214216
### SQL UDTF
217+
215218
The embedding index can also be used to perform search in SQL, via a user-defined table function (UDTF).
219+
216220
```sql
217221
SELECT id, extra_column, score
218222
FROM vector_search(my_table, 'search query')
@@ -222,6 +226,7 @@ LIMIT 5
222226
```
223227
224228
The function signature of `vector_search` is
229+
225230
```sql
226231
vector_search(
227232
table STRING, -- Dataset name (required)

website/docusaurus.config.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,8 @@ const config: Config = {
9797
themes: ['docusaurus-theme-openapi-docs'],
9898
themeConfig: {
9999
announcementBar: {
100-
content: '<a href="/blog/releases/v1.5.1">Spice.ai OSS v1.5.1</a> is now available! 🚀',
100+
content:
101+
'🎓 Learn about <a href="/blog/amazon-s3-vectors-with-spice">Amazon S3 Vectors with Spice</a> in the latest engineering blog post!',
101102
backgroundColor: 'var(--announcement-bar-bg)',
102103
textColor: 'var(--announcement-bar-text)',
103104
isCloseable: true
194 KB
Loading

0 commit comments

Comments
 (0)