Skip to content

Commit 74e113b

Browse files
authored
Add Elasticsearch connector and vector engine recipes (#380)
1 parent 94b534c commit 74e113b

11 files changed

Lines changed: 1996 additions & 0 deletions

File tree

elasticsearch/connector/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*.parquet

elasticsearch/connector/README.md

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Elasticsearch Data Connector
2+
3+
Works with `v2.0+`
4+
5+
This recipe demonstrates how to query Elasticsearch indices from Spice using federated SQL. It includes:
6+
7+
- `articles` — a federated dataset queried directly from Elasticsearch
8+
- `all_types` — a federated dataset covering supported Elasticsearch field types
9+
10+
The Elasticsearch connector can also power `vector_search`, `text_search`, and `rrf` for indices that contain the required search fields.
11+
12+
## Prerequisites
13+
14+
- [Spice CLI](https://docs.spiceai.org/getting-started) installed
15+
- Docker installed
16+
17+
## Getting Started
18+
19+
### Step 1: Prepare the recipe directory
20+
21+
Change into the recipe directory:
22+
23+
```bash
24+
cd cookbook/elasticsearch/connector
25+
```
26+
27+
### Step 2: Start Elasticsearch and seed the sample indices
28+
29+
Start the local Elasticsearch service and seed the sample data:
30+
31+
```bash
32+
docker compose up
33+
```
34+
35+
Keep this running while you use the recipe.
36+
37+
### Step 3: Start the Spice runtime
38+
39+
In a new terminal, start the Spice runtime:
40+
41+
```bash
42+
spice run
43+
```
44+
45+
### Step 4: Open the Spice SQL REPL
46+
47+
In another terminal, open the Spice SQL REPL:
48+
49+
```bash
50+
spice sql
51+
```
52+
53+
### Step 5: Run a few example queries
54+
55+
Run a few basic federated SQL queries to verify the Elasticsearch datasets are available.
56+
57+
Query the `articles` index:
58+
59+
```sql
60+
SELECT id, title, category, author
61+
FROM articles
62+
WHERE category = 'machine_learning'
63+
LIMIT 10;
64+
```
65+
66+
```
67+
+----+----------------------------------------------------------------------------------+------------------+------------------+
68+
| id | title | category | author |
69+
|int32| varchar | varchar | varchar |
70+
+----+----------------------------------------------------------------------------------+------------------+------------------+
71+
| 6 | Cost-Aware AutoML on Kubernetes | machine_learning | Bob Martinez |
72+
| 14 | Contrastive Learning Explained: From Theory to Production | machine_learning | Priya Sharma |
73+
| 27 | How Federated Learning Is Transforming AI Applications | machine_learning | Quinn Walker |
74+
| 32 | Scaling Few-Shot Learning to Billions of Parameters | machine_learning | Tom Brennan |
75+
| 38 | Why understanding Generative Adversarial Networks Through Mathematical Intuition | machine_learning | Alice Chen |
76+
| 40 | Few-Shot Learning: State of the Art in 2025 | machine_learning | Luca Ferrari |
77+
| 45 | Attention Mechanisms: State of the Art in 2025 | machine_learning | Carol Okonkwo |
78+
| 47 | Why understanding Fine-Tuning Through Mathematical Intuition | machine_learning | Ravi Subramaniam |
79+
| 63 | Scaling Self-Supervised Learning to Billions of Parameters | machine_learning | Priya Sharma |
80+
| 68 | A Practical Guide to Diffusion Models | machine_learning | Luca Ferrari |
81+
+----+----------------------------------------------------------------------------------+------------------+------------------+
82+
83+
Time: 0.036074709 seconds. 10 rows.
84+
```
85+
86+
Filter the `all_types` index on a keyword field:
87+
88+
```sql
89+
SELECT *
90+
FROM all_types
91+
WHERE field_keyword = 'category_0';
92+
```
93+
94+
```
95+
+--------------------------+---------------+------------+-----------------------------------------------+----------------------+--------------------------------+-------------------------------------------------------------+--------------------------------------------+--------------------+--------------------------+-------------------------------------------------------+-------------+-------------------------+--------------------------------------+--------------------------------------------------------+------------------+---------------+-----------------------+---------------+---------------+---------------+---------------------------+---------------------------------------------------------------+-------------------+--------------------+--------------------+----------------------------------------+-------------+-----------------------------------------------------------+-------------------------------------+---------------------+---------------+----+
96+
| field_binary | field_boolean | field_byte | field_completion | field_date | field_date_nanos | field_date_range | field_dense_vector | field_double | field_double_range | field_flattened | field_float | field_float_range | field_geo_point | field_geo_shape | field_half_float | field_integer | field_integer_range | field_ip | field_keyword | field_long | field_long_range | field_nested | field_object.name | field_object.value | field_scaled_float | field_search_as_you_type | field_short | field_text | field_token_count | field_unsigned_long | field_version | id |
97+
| varchar | boolean | int8 | varchar | varchar | varchar | varchar | float32[4] | float64 | varchar | varchar | float32 | varchar | varchar | varchar | float32 | int32 | varchar | varchar | varchar | int64 | varchar | varchar | varchar | int32 | float32 | varchar | int16 | varchar | varchar | varchar | varchar |int32|
98+
+--------------------------+---------------+------------+-----------------------------------------------+----------------------+--------------------------------+-------------------------------------------------------------+--------------------------------------------+--------------------+--------------------------+-------------------------------------------------------+-------------+-------------------------+--------------------------------------+--------------------------------------------------------+------------------+---------------+-----------------------+---------------+---------------+---------------+---------------------------+---------------------------------------------------------------+-------------------+--------------------+--------------------+----------------------------------------+-------------+-----------------------------------------------------------+-------------------------------------+---------------------+---------------+----+
99+
| YmluYXJ5X3BheWxvYWRfNQ== | true | -114 | {"input":["suggest_5","doc_5"],"weight":6} | 2024-06-06T05:00:00Z | 2024-06-06T05:00:00.000000000Z | {"gte":"2024-01-06T00:00:00Z","lte":"2024-12-06T23:59:59Z"} | [-0.958924, -0.279415, 0.656987, 0.989358] | 680696.2410453358 | {"gte":5.5,"lte":5.51} | {"arbitrary_key":"value_5","nested_key":{"deep":5}} | 551.9171 | {"gte":5.0,"lte":5.5} | {"lat":-21.463575,"lon":-143.289215} | {"type":"Point","coordinates":[-90.240943,41.613069]} | -50.19 | 94455 | {"gte":50,"lte":55} | 192.168.5.35 | category_0 | 24150178885 | {"gte":5000,"lte":5100} | [{"tag":"tag_1","score":0.372},{"tag":"tag_2","score":0.868}] | obj_5 | 35 | 51.85 | searchable text for document number 5 | 14225 | The quick brown fox jumps over the lazy dog — document 5 | token count source text document 5 | 261035185072990349 | 1.5.0 | 5 |
100+
| YmluYXJ5X3BheWxvYWRfMTA= | false | -121 | {"input":["suggest_10","doc_10"],"weight":11} | 2024-02-11T10:00:00Z | 2024-02-11T10:00:00.000000000Z | {"gte":"2024-01-11T00:00:00Z","lte":"2024-12-11T23:59:59Z"} | [-0.544021, -0.99999, -0.536573, 0.420167] | -587803.5357209966 | {"gte":11.0,"lte":11.01} | {"arbitrary_key":"value_10","nested_key":{"deep":10}} | 626.6425 | {"gte":10.0,"lte":10.5} | {"lat":-45.000598,"lon":163.014087} | {"type":"Point","coordinates":[178.760517,-81.979851]} | 64.72 | 12430 | {"gte":100,"lte":105} | 192.168.10.70 | category_0 | -955323551533 | {"gte":10000,"lte":10100} | [{"tag":"tag_1","score":0.521},{"tag":"tag_2","score":0.328}] | obj_10 | 70 | 653.47 | searchable text for document number 10 | 30482 | The quick brown fox jumps over the lazy dog — document 10 | token count source text document 10 | 79329244941176303 | 1.10.0 | 10 |
101+
+--------------------------+---------------+------------+-----------------------------------------------+----------------------+--------------------------------+-------------------------------------------------------------+--------------------------------------------+--------------------+--------------------------+-------------------------------------------------------+-------------+-------------------------+--------------------------------------+--------------------------------------------------------+------------------+---------------+-----------------------+---------------+---------------+---------------+---------------------------+---------------------------------------------------------------+-------------------+--------------------+--------------------+----------------------------------------+-------------+-----------------------------------------------------------+-------------------------------------+---------------------+---------------+----+
102+
103+
Time: 0.024743022 seconds. 2 rows.
104+
```
105+
106+
## Learn more
107+
108+
- [Elasticsearch Data Connector Documentation](https://spiceai.org/docs/components/data-connectors/elasticsearch)
109+
- [Search Functionality Documentation](https://spiceai.org/docs/features/search)
110+
- [Datasets Reference](https://docs.spiceai.org/reference/spicepod/datasets)
111+
- [Spice SQL CLI Reference](https://docs.spiceai.org/cli/reference/sql)
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
services:
2+
elasticsearch:
3+
image: docker.elastic.co/elasticsearch/elasticsearch:8.13.4
4+
container_name: es01
5+
environment:
6+
- discovery.type=single-node
7+
- xpack.security.enabled=true
8+
- xpack.security.http.ssl.enabled=false
9+
- ES_JAVA_OPTS=-Xms512m -Xmx512m
10+
- ELASTIC_PASSWORD=spiceai
11+
ports:
12+
- "9200:9200"
13+
volumes:
14+
- esdata:/usr/share/elasticsearch/data
15+
healthcheck:
16+
test:
17+
[
18+
"CMD-SHELL",
19+
"curl -sf -u elastic:spiceai http://localhost:9200/_cluster/health | grep -qE '\"status\":\"(green|yellow)\"'",
20+
]
21+
interval: 10s
22+
timeout: 10s
23+
retries: 15
24+
start_period: 40s
25+
26+
es-init:
27+
image: python:3.12-slim
28+
container_name: es-init
29+
depends_on:
30+
elasticsearch:
31+
condition: service_healthy
32+
volumes:
33+
- ./generate_data.py:/app/generate_data.py:ro
34+
- ./load_data.py:/app/load_data.py:ro
35+
working_dir: /app
36+
environment:
37+
- ES_HOST=http://elasticsearch:9200
38+
- ES_USER=elastic
39+
- ES_PASS=spiceai
40+
command: >
41+
bash -c "
42+
pip install --quiet pandas pyarrow faker requests &&
43+
python generate_data.py &&
44+
python load_data.py --all-types
45+
"
46+
47+
volumes:
48+
esdata:
49+
driver: local

0 commit comments

Comments
 (0)