Works with v1.11+
This recipe demonstrates how to configure a Spice dataset to connect to a ScyllaDB cluster and query data using federated SQL queries.
- A ScyllaDB cluster (self-hosted or ScyllaDB Cloud)
- ScyllaDB CQL native transport accessible (default port 9042)
- User credentials with read access to the target keyspace/tables
- Spice.ai runtime (Getting Started)
git clone https://github.com/spiceai/cookbook.git
cd cookbook/scylladbIf you don't have an existing ScyllaDB cluster, you can start one locally using Docker:
docker run --name scylladb -d \
-p 9042:9042 \
scylladb/scylla:latest \
--smp 1 --memory 750MWait for ScyllaDB to be ready (about 30 seconds):
docker exec -it scylladb nodetool statusConnect to the ScyllaDB CQL shell:
docker exec -it scylladb cqlshCreate a keyspace and table with sample data:
-- Create keyspace
CREATE KEYSPACE IF NOT EXISTS demo
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
-- Use the keyspace
USE demo;
-- Create a sample users table
CREATE TABLE IF NOT EXISTS users (
id UUID PRIMARY KEY,
name TEXT,
email TEXT,
age INT,
created_at TIMESTAMP
);
-- Insert sample data
INSERT INTO users (id, name, email, age, created_at)
VALUES (uuid(), 'Alice Smith', 'alice@example.com', 30, toTimestamp(now()));
INSERT INTO users (id, name, email, age, created_at)
VALUES (uuid(), 'Bob Johnson', 'bob@example.com', 25, toTimestamp(now()));
INSERT INTO users (id, name, email, age, created_at)
VALUES (uuid(), 'Charlie Brown', 'charlie@example.com', 35, toTimestamp(now()));
-- Verify data
SELECT * FROM users;Exit the CQL shell:
EXIT;If your ScyllaDB cluster requires authentication, update the .env file:
SCYLLADB_USER=<your_username>
SCYLLADB_PASS=<your_password>spice runIf the configuration is correct, you should see output similar to:
INFO runtime::init::dataset: Dataset users initializing...
INFO runtime::init::dataset: Dataset users registered (scylladb:users), acceleration (none), results cache enabled.spice sqlTo show tables:
SHOW TABLES;To query your sample data:
SELECT * FROM users;Sample output:
+--------------------------------------+---------------+---------------------+-----+-------------------------+
| id | name | email | age | created_at |
+--------------------------------------+---------------+---------------------+-----+-------------------------+
| a1b2c3d4-e5f6-7890-abcd-ef1234567890 | Alice Smith | alice@example.com | 30 | 2025-01-19T12:00:00.000 |
| b2c3d4e5-f6a7-8901-bcde-f12345678901 | Bob Johnson | bob@example.com | 25 | 2025-01-19T12:00:01.000 |
| c3d4e5f6-a7b8-9012-cdef-123456789012 | Charlie Brown | charlie@example.com | 35 | 2025-01-19T12:00:02.000 |
+--------------------------------------+---------------+---------------------+-----+-------------------------+Since ScyllaDB queries fetch all data for processing (filter pushdown is not supported), enabling acceleration is recommended for frequently queried tables:
datasets:
- from: scylladb:users
name: users
params:
scylladb_host: localhost
scylladb_keyspace: demo
acceleration:
enabled: true
engine: duckdb
refresh_check_interval: 1h| Parameter | Description | Required | Default |
|---|---|---|---|
scylladb_host |
Hostname(s) of ScyllaDB nodes. Comma-separated for multiple nodes. | Yes | - |
scylladb_hosts |
Alternative to scylladb_host. Comma-separated list of hostnames. |
No | - |
scylladb_port |
ScyllaDB CQL native transport port. | No | 9042 |
scylladb_keyspace |
The keyspace to use for queries. | Yes | - |
scylladb_user |
Username for authentication. | No | - |
scylladb_pass |
Password for authentication. | No | - |
scylladb_datacenter |
Preferred datacenter for connection routing. | No | - |
scylladb_ssl |
Enable SSL/TLS for connections. | No | false |
connection_timeout |
Connection timeout in milliseconds. | No | 10000 |
datasets:
- from: scylladb:events
name: events
params:
scylladb_hosts: node1.scylla.local,node2.scylla.local,node3.scylla.local
scylladb_keyspace: analytics
scylladb_datacenter: us-west-2| CQL Type | Arrow Type | Notes |
|---|---|---|
boolean |
Boolean |
|
tinyint |
Int8 |
|
smallint |
Int16 |
|
int |
Int32 |
|
bigint |
Int64 |
|
counter |
Int64 |
Cassandra counter type |
float |
Float32 |
|
double |
Float64 |
|
decimal |
Decimal128(38, 2) |
Arbitrary precision → fixed precision |
blob |
Binary |
|
date |
Date32 |
Days since epoch |
text |
Utf8 |
|
varchar |
Utf8 |
|
uuid |
Utf8 |
|
timeuuid |
Utf8 |
|
timestamp |
Timestamp |
The following SQL operations cannot be pushed down to ScyllaDB and are performed locally:
- JOINs: All joins are performed locally by DataFusion
- Aggregations: COUNT, SUM, AVG, etc. are computed locally
- Subqueries: Nested queries are not supported in CQL
- Window functions: RANK, ROW_NUMBER, etc. not supported
- Complex WHERE clauses: CQL requires partition key in WHERE; Spice fetches all data
- ORDER BY: Sorting is done locally
- Read-only: The connector does not support INSERT, UPDATE, or DELETE operations
- Decimal precision: Fixed at precision=38, scale=2; may not suit all use cases
- Collection types: Lists, sets, and maps are converted to JSON string representation
- Large tables: Without acceleration, large tables cause significant data transfer
To stop and remove the ScyllaDB container:
docker stop scylladb
docker rm scylladb