In-Memory Arrow Data Accelerator

Works with v1.0+

Create a connector instance using sample data and accelerate it using In-Memory Arrow Data Accelerator.

Requirements

Spice CLI installed (see Getting Started).

Follow these steps

Step 1. Initialize a new Spice app.

spice init arrow-acceleration-qs
cd arrow-acceleration-qs

Step 2. Configure s3 dataset: copy and paste the YAML below to spicepod.yaml in the Spice app.

version: v1
kind: Spicepod
name: arrow-acceleration-qs
datasets:
  - from: s3://spiceai-demo-datasets/taxi_trips/2024/
    name: taxi_trips
    description: taxi trips in s3
    params:
      file_format: parquet

Step 3. Start the Spice runtime.

spice run

Confirm in the terminal output the taxi_trips dataset has been loaded:

2025/07/14 08:50:13 INFO Checking for latest Spice runtime release...
2025/07/14 08:50:15 INFO Spice.ai runtime starting...
2025-07-14T15:50:15.370061Z  INFO runtime::init::caching: Initialized results cache; max size: 128.00 MiB, item ttl: 1s
2025-07-14T15:50:15.370199Z  INFO runtime::init::caching: Initialized search results cache;
2025-07-14T15:50:15.732242Z  INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
2025-07-14T15:50:15.732235Z  INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052
2025-07-14T15:50:15.734062Z  INFO runtime::init::dataset: Initializing dataset taxi_trips
2025-07-14T15:50:15.738931Z  INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090
2025-07-14T15:50:16.608896Z  INFO runtime::init::dataset: Dataset taxi_trips registered (s3://spiceai-demo-datasets/taxi_trips/2024/), acceleration (arrow), results cache enabled.
2025-07-14T15:50:16.610030Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset taxi_trips
2025-07-14T15:50:29.423673Z  INFO runtime::accelerated_table::refresh_task: Loaded 2,964,624 rows (399.41 MiB) for dataset taxi_trips in 12s 813ms.
2025-07-14T15:50:29.494757Z  INFO runtime: All components are loaded. Spice runtime is ready!

Step 4. Run queries against the dataset using the Spice SQL REPL.

In a new terminal, start the Spice SQL REPL

spice sql

Query the taxi_trips dataset, observing the long query time.

select "VendorID", tpep_pickup_datetime, tpep_dropoff_datetime, passenger_count from taxi_trips limit 10;

+----------+----------------------+-----------------------+-----------------+
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count |
+----------+----------------------+-----------------------+-----------------+
| 2        | 2024-01-29T19:28:41  | 2024-01-29T19:36:46   | 2               |
| 1        | 2024-01-29T19:22:21  | 2024-01-29T19:28:45   | 2               |
| 1        | 2024-01-29T19:50:24  | 2024-01-29T20:09:21   | 2               |
| 1        | 2024-01-29T19:43:52  | 2024-01-29T20:01:40   | 2               |
| 1        | 2024-01-29T19:09:57  | 2024-01-29T19:55:36   | 2               |
| 1        | 2024-01-29T19:51:28  | 2024-01-29T20:09:16   | 2               |
| 1        | 2024-01-29T19:23:46  | 2024-01-29T19:31:06   | 2               |
| 2        | 2024-01-29T19:01:27  | 2024-01-29T19:09:07   | 2               |
| 1        | 2024-01-29T19:13:53  | 2024-01-29T19:23:09   | 2               |
| 1        | 2024-01-29T19:53:55  | 2024-01-29T20:06:56   | 2               |
+----------+----------------------+-----------------------+-----------------+

Time: 1.081530375 seconds. 10 rows.

Step 5. Update the spicepod.yaml to enable In-Memory Arrow acceleration.

version: v1
kind: Spicepod
name: arrow-acceleration-qs
datasets:
  - from: s3://spiceai-demo-datasets/taxi_trips/2024/
    name: taxi_trips
    description: taxi trips in s3
    params:
      file_format: parquet
    acceleration:
      enabled: true

Step 6. Save the changes in Spice app and observe the dataset updating and accelerating.

2024-10-22T19:28:24.204608Z  INFO runtime: Updating accelerated dataset taxi_trips...
2024-10-22T19:28:25.202828Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset taxi_trips
2024-10-22T19:29:07.729346Z  INFO runtime::accelerated_table::refresh_task: Loaded 2,964,624 rows (398.86 MiB) for dataset taxi_trips in 42s 525ms.
2024-10-22T19:29:09.217425Z  INFO runtime: Dataset taxi_trips registered (s3://spiceai-demo-datasets/taxi_trips/2024/), acceleration (arrow), results cache enabled.

Step 7. Run a query against the taxi_trips dataset again, observing the fast query time.

select "VendorID", tpep_pickup_datetime, tpep_dropoff_datetime, passenger_count from taxi_trips limit 10;

+----------+----------------------+-----------------------+-----------------+
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count |
+----------+----------------------+-----------------------+-----------------+
| 1        | 2024-01-12T19:14:53  | 2024-01-12T19:28:34   | 2               |
| 2        | 2024-01-12T19:03:12  | 2024-01-12T19:17:19   | 2               |
| 2        | 2024-01-12T19:34:22  | 2024-01-12T19:38:11   | 2               |
| 2        | 2024-01-12T19:44:51  | 2024-01-12T19:49:40   | 2               |
| 1        | 2024-01-12T19:31:54  | 2024-01-12T19:38:43   | 2               |
| 2        | 2024-01-12T19:54:37  | 2024-01-12T19:59:55   | 2               |
| 1        | 2024-01-12T19:02:32  | 2024-01-12T19:12:25   | 2               |
| 1        | 2024-01-12T19:22:38  | 2024-01-12T19:37:30   | 2               |
| 2        | 2024-01-12T19:34:30  | 2024-01-12T20:31:18   | 2               |
| 2        | 2024-01-12T19:54:06  | 2024-01-12T20:07:29   | 2               |
+----------+----------------------+-----------------------+-----------------+

Time: 0.004575584 seconds. 10 rows.

Learn more

In-Memory Arrow Data Accelerator Documentation.
For using spice sql, see the CLI reference.
See the datasets reference for additional dataset configuration options.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In-Memory Arrow Data Accelerator

Requirements

Follow these steps

Learn more

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

In-Memory Arrow Data Accelerator

Requirements

Follow these steps

Learn more