|
| 1 | +# Elasticsearch Data Connector |
| 2 | + |
| 3 | +Works with `v2.0+` |
| 4 | + |
| 5 | +This recipe demonstrates how to query Elasticsearch indices from Spice using federated SQL. It includes: |
| 6 | + |
| 7 | +- `articles` — a federated dataset queried directly from Elasticsearch |
| 8 | +- `all_types` — a federated dataset covering supported Elasticsearch field types |
| 9 | + |
| 10 | +The Elasticsearch connector can also power `vector_search`, `text_search`, and `rrf` for indices that contain the required search fields. |
| 11 | + |
| 12 | +## Prerequisites |
| 13 | + |
| 14 | +- [Spice CLI](https://docs.spiceai.org/getting-started) installed |
| 15 | +- Docker installed |
| 16 | + |
| 17 | +## Getting Started |
| 18 | + |
| 19 | +### Step 1: Prepare the recipe directory |
| 20 | + |
| 21 | +Change into the recipe directory: |
| 22 | + |
| 23 | +```bash |
| 24 | +cd cookbook/elasticsearch/connector |
| 25 | +``` |
| 26 | + |
| 27 | +### Step 2: Start Elasticsearch and seed the sample indices |
| 28 | + |
| 29 | +Start the local Elasticsearch service and seed the sample data: |
| 30 | + |
| 31 | +```bash |
| 32 | +docker compose up |
| 33 | +``` |
| 34 | + |
| 35 | +Keep this running while you use the recipe. |
| 36 | + |
| 37 | +### Step 3: Start the Spice runtime |
| 38 | + |
| 39 | +In a new terminal, start the Spice runtime: |
| 40 | + |
| 41 | +```bash |
| 42 | +spice run |
| 43 | +``` |
| 44 | + |
| 45 | +### Step 4: Open the Spice SQL REPL |
| 46 | + |
| 47 | +In another terminal, open the Spice SQL REPL: |
| 48 | + |
| 49 | +```bash |
| 50 | +spice sql |
| 51 | +``` |
| 52 | + |
| 53 | +### Step 5: Run a few example queries |
| 54 | + |
| 55 | +Run a few basic federated SQL queries to verify the Elasticsearch datasets are available. |
| 56 | + |
| 57 | +Query the `articles` index: |
| 58 | + |
| 59 | +```sql |
| 60 | +SELECT id, title, category, author |
| 61 | +FROM articles |
| 62 | +WHERE category = 'machine_learning' |
| 63 | +LIMIT 10; |
| 64 | +``` |
| 65 | + |
| 66 | +``` |
| 67 | ++----+----------------------------------------------------------------------------------+------------------+------------------+ |
| 68 | +| id | title | category | author | |
| 69 | +|int32| varchar | varchar | varchar | |
| 70 | ++----+----------------------------------------------------------------------------------+------------------+------------------+ |
| 71 | +| 6 | Cost-Aware AutoML on Kubernetes | machine_learning | Bob Martinez | |
| 72 | +| 14 | Contrastive Learning Explained: From Theory to Production | machine_learning | Priya Sharma | |
| 73 | +| 27 | How Federated Learning Is Transforming AI Applications | machine_learning | Quinn Walker | |
| 74 | +| 32 | Scaling Few-Shot Learning to Billions of Parameters | machine_learning | Tom Brennan | |
| 75 | +| 38 | Why understanding Generative Adversarial Networks Through Mathematical Intuition | machine_learning | Alice Chen | |
| 76 | +| 40 | Few-Shot Learning: State of the Art in 2025 | machine_learning | Luca Ferrari | |
| 77 | +| 45 | Attention Mechanisms: State of the Art in 2025 | machine_learning | Carol Okonkwo | |
| 78 | +| 47 | Why understanding Fine-Tuning Through Mathematical Intuition | machine_learning | Ravi Subramaniam | |
| 79 | +| 63 | Scaling Self-Supervised Learning to Billions of Parameters | machine_learning | Priya Sharma | |
| 80 | +| 68 | A Practical Guide to Diffusion Models | machine_learning | Luca Ferrari | |
| 81 | ++----+----------------------------------------------------------------------------------+------------------+------------------+ |
| 82 | +
|
| 83 | +Time: 0.036074709 seconds. 10 rows. |
| 84 | +``` |
| 85 | + |
| 86 | +Filter the `all_types` index on a keyword field: |
| 87 | + |
| 88 | +```sql |
| 89 | +SELECT * |
| 90 | +FROM all_types |
| 91 | +WHERE field_keyword = 'category_0'; |
| 92 | +``` |
| 93 | + |
| 94 | +``` |
| 95 | ++--------------------------+---------------+------------+-----------------------------------------------+----------------------+--------------------------------+-------------------------------------------------------------+--------------------------------------------+--------------------+--------------------------+-------------------------------------------------------+-------------+-------------------------+--------------------------------------+--------------------------------------------------------+------------------+---------------+-----------------------+---------------+---------------+---------------+---------------------------+---------------------------------------------------------------+-------------------+--------------------+--------------------+----------------------------------------+-------------+-----------------------------------------------------------+-------------------------------------+---------------------+---------------+----+ |
| 96 | +| field_binary | field_boolean | field_byte | field_completion | field_date | field_date_nanos | field_date_range | field_dense_vector | field_double | field_double_range | field_flattened | field_float | field_float_range | field_geo_point | field_geo_shape | field_half_float | field_integer | field_integer_range | field_ip | field_keyword | field_long | field_long_range | field_nested | field_object.name | field_object.value | field_scaled_float | field_search_as_you_type | field_short | field_text | field_token_count | field_unsigned_long | field_version | id | |
| 97 | +| varchar | boolean | int8 | varchar | varchar | varchar | varchar | float32[4] | float64 | varchar | varchar | float32 | varchar | varchar | varchar | float32 | int32 | varchar | varchar | varchar | int64 | varchar | varchar | varchar | int32 | float32 | varchar | int16 | varchar | varchar | varchar | varchar |int32| |
| 98 | ++--------------------------+---------------+------------+-----------------------------------------------+----------------------+--------------------------------+-------------------------------------------------------------+--------------------------------------------+--------------------+--------------------------+-------------------------------------------------------+-------------+-------------------------+--------------------------------------+--------------------------------------------------------+------------------+---------------+-----------------------+---------------+---------------+---------------+---------------------------+---------------------------------------------------------------+-------------------+--------------------+--------------------+----------------------------------------+-------------+-----------------------------------------------------------+-------------------------------------+---------------------+---------------+----+ |
| 99 | +| YmluYXJ5X3BheWxvYWRfNQ== | true | -114 | {"input":["suggest_5","doc_5"],"weight":6} | 2024-06-06T05:00:00Z | 2024-06-06T05:00:00.000000000Z | {"gte":"2024-01-06T00:00:00Z","lte":"2024-12-06T23:59:59Z"} | [-0.958924, -0.279415, 0.656987, 0.989358] | 680696.2410453358 | {"gte":5.5,"lte":5.51} | {"arbitrary_key":"value_5","nested_key":{"deep":5}} | 551.9171 | {"gte":5.0,"lte":5.5} | {"lat":-21.463575,"lon":-143.289215} | {"type":"Point","coordinates":[-90.240943,41.613069]} | -50.19 | 94455 | {"gte":50,"lte":55} | 192.168.5.35 | category_0 | 24150178885 | {"gte":5000,"lte":5100} | [{"tag":"tag_1","score":0.372},{"tag":"tag_2","score":0.868}] | obj_5 | 35 | 51.85 | searchable text for document number 5 | 14225 | The quick brown fox jumps over the lazy dog — document 5 | token count source text document 5 | 261035185072990349 | 1.5.0 | 5 | |
| 100 | +| YmluYXJ5X3BheWxvYWRfMTA= | false | -121 | {"input":["suggest_10","doc_10"],"weight":11} | 2024-02-11T10:00:00Z | 2024-02-11T10:00:00.000000000Z | {"gte":"2024-01-11T00:00:00Z","lte":"2024-12-11T23:59:59Z"} | [-0.544021, -0.99999, -0.536573, 0.420167] | -587803.5357209966 | {"gte":11.0,"lte":11.01} | {"arbitrary_key":"value_10","nested_key":{"deep":10}} | 626.6425 | {"gte":10.0,"lte":10.5} | {"lat":-45.000598,"lon":163.014087} | {"type":"Point","coordinates":[178.760517,-81.979851]} | 64.72 | 12430 | {"gte":100,"lte":105} | 192.168.10.70 | category_0 | -955323551533 | {"gte":10000,"lte":10100} | [{"tag":"tag_1","score":0.521},{"tag":"tag_2","score":0.328}] | obj_10 | 70 | 653.47 | searchable text for document number 10 | 30482 | The quick brown fox jumps over the lazy dog — document 10 | token count source text document 10 | 79329244941176303 | 1.10.0 | 10 | |
| 101 | ++--------------------------+---------------+------------+-----------------------------------------------+----------------------+--------------------------------+-------------------------------------------------------------+--------------------------------------------+--------------------+--------------------------+-------------------------------------------------------+-------------+-------------------------+--------------------------------------+--------------------------------------------------------+------------------+---------------+-----------------------+---------------+---------------+---------------+---------------------------+---------------------------------------------------------------+-------------------+--------------------+--------------------+----------------------------------------+-------------+-----------------------------------------------------------+-------------------------------------+---------------------+---------------+----+ |
| 102 | +
|
| 103 | +Time: 0.024743022 seconds. 2 rows. |
| 104 | +``` |
| 105 | + |
| 106 | +## Learn more |
| 107 | + |
| 108 | +- [Elasticsearch Data Connector Documentation](https://spiceai.org/docs/components/data-connectors/elasticsearch) |
| 109 | +- [Search Functionality Documentation](https://spiceai.org/docs/features/search) |
| 110 | +- [Datasets Reference](https://docs.spiceai.org/reference/spicepod/datasets) |
| 111 | +- [Spice SQL CLI Reference](https://docs.spiceai.org/cli/reference/sql) |
0 commit comments