Skip to content

Latest commit

 

History

History
146 lines (114 loc) · 6.15 KB

File metadata and controls

146 lines (114 loc) · 6.15 KB

In-Memory Arrow Data Accelerator

Works with v1.0+

Create a connector instance using sample data and accelerate it using In-Memory Arrow Data Accelerator.

Requirements

Follow these steps

Step 1. Initialize a new Spice app.

spice init arrow-acceleration-qs
cd arrow-acceleration-qs

Step 2. Configure s3 dataset: copy and paste the YAML below to spicepod.yaml in the Spice app.

version: v1
kind: Spicepod
name: arrow-acceleration-qs
datasets:
  - from: s3://spiceai-demo-datasets/taxi_trips/2024/
    name: taxi_trips
    description: taxi trips in s3
    params:
      file_format: parquet

Step 3. Start the Spice runtime.

spice run

Confirm in the terminal output the taxi_trips dataset has been loaded:

2025/07/14 08:50:13 INFO Checking for latest Spice runtime release...
2025/07/14 08:50:15 INFO Spice.ai runtime starting...
2025-07-14T15:50:15.370061Z  INFO runtime::init::caching: Initialized results cache; max size: 128.00 MiB, item ttl: 1s
2025-07-14T15:50:15.370199Z  INFO runtime::init::caching: Initialized search results cache;
2025-07-14T15:50:15.732242Z  INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
2025-07-14T15:50:15.732235Z  INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052
2025-07-14T15:50:15.734062Z  INFO runtime::init::dataset: Initializing dataset taxi_trips
2025-07-14T15:50:15.738931Z  INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090
2025-07-14T15:50:16.608896Z  INFO runtime::init::dataset: Dataset taxi_trips registered (s3://spiceai-demo-datasets/taxi_trips/2024/), acceleration (arrow), results cache enabled.
2025-07-14T15:50:16.610030Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset taxi_trips
2025-07-14T15:50:29.423673Z  INFO runtime::accelerated_table::refresh_task: Loaded 2,964,624 rows (399.41 MiB) for dataset taxi_trips in 12s 813ms.
2025-07-14T15:50:29.494757Z  INFO runtime: All components are loaded. Spice runtime is ready!

Step 4. Run queries against the dataset using the Spice SQL REPL.

In a new terminal, start the Spice SQL REPL

spice sql

Query the taxi_trips dataset, observing the long query time.

select "VendorID", tpep_pickup_datetime, tpep_dropoff_datetime, passenger_count from taxi_trips limit 10;
+----------+----------------------+-----------------------+-----------------+
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count |
+----------+----------------------+-----------------------+-----------------+
| 2        | 2024-01-29T19:28:41  | 2024-01-29T19:36:46   | 2               |
| 1        | 2024-01-29T19:22:21  | 2024-01-29T19:28:45   | 2               |
| 1        | 2024-01-29T19:50:24  | 2024-01-29T20:09:21   | 2               |
| 1        | 2024-01-29T19:43:52  | 2024-01-29T20:01:40   | 2               |
| 1        | 2024-01-29T19:09:57  | 2024-01-29T19:55:36   | 2               |
| 1        | 2024-01-29T19:51:28  | 2024-01-29T20:09:16   | 2               |
| 1        | 2024-01-29T19:23:46  | 2024-01-29T19:31:06   | 2               |
| 2        | 2024-01-29T19:01:27  | 2024-01-29T19:09:07   | 2               |
| 1        | 2024-01-29T19:13:53  | 2024-01-29T19:23:09   | 2               |
| 1        | 2024-01-29T19:53:55  | 2024-01-29T20:06:56   | 2               |
+----------+----------------------+-----------------------+-----------------+

Time: 1.081530375 seconds. 10 rows.

Step 5. Update the spicepod.yaml to enable In-Memory Arrow acceleration.

version: v1
kind: Spicepod
name: arrow-acceleration-qs
datasets:
  - from: s3://spiceai-demo-datasets/taxi_trips/2024/
    name: taxi_trips
    description: taxi trips in s3
    params:
      file_format: parquet
    acceleration:
      enabled: true

Step 6. Save the changes in Spice app and observe the dataset updating and accelerating.

2024-10-22T19:28:24.204608Z  INFO runtime: Updating accelerated dataset taxi_trips...
2024-10-22T19:28:25.202828Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset taxi_trips
2024-10-22T19:29:07.729346Z  INFO runtime::accelerated_table::refresh_task: Loaded 2,964,624 rows (398.86 MiB) for dataset taxi_trips in 42s 525ms.
2024-10-22T19:29:09.217425Z  INFO runtime: Dataset taxi_trips registered (s3://spiceai-demo-datasets/taxi_trips/2024/), acceleration (arrow), results cache enabled.

Step 7. Run a query against the taxi_trips dataset again, observing the fast query time.

select "VendorID", tpep_pickup_datetime, tpep_dropoff_datetime, passenger_count from taxi_trips limit 10;
+----------+----------------------+-----------------------+-----------------+
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count |
+----------+----------------------+-----------------------+-----------------+
| 1        | 2024-01-12T19:14:53  | 2024-01-12T19:28:34   | 2               |
| 2        | 2024-01-12T19:03:12  | 2024-01-12T19:17:19   | 2               |
| 2        | 2024-01-12T19:34:22  | 2024-01-12T19:38:11   | 2               |
| 2        | 2024-01-12T19:44:51  | 2024-01-12T19:49:40   | 2               |
| 1        | 2024-01-12T19:31:54  | 2024-01-12T19:38:43   | 2               |
| 2        | 2024-01-12T19:54:37  | 2024-01-12T19:59:55   | 2               |
| 1        | 2024-01-12T19:02:32  | 2024-01-12T19:12:25   | 2               |
| 1        | 2024-01-12T19:22:38  | 2024-01-12T19:37:30   | 2               |
| 2        | 2024-01-12T19:34:30  | 2024-01-12T20:31:18   | 2               |
| 2        | 2024-01-12T19:54:06  | 2024-01-12T20:07:29   | 2               |
+----------+----------------------+-----------------------+-----------------+

Time: 0.004575584 seconds. 10 rows.

Learn more