Skip to content

Commit 2864a55

Browse files
authored
Merge branch 'trunk' into peasee/260218-etl-in-spicebench
2 parents 76df416 + b291871 commit 2864a55

47 files changed

Lines changed: 4973 additions & 134 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
---
2+
name: system-adapter-builder
3+
description: Build or update a SpiceBench system adapter with JSON-RPC over stdio and HTTP, including setup/query_method/teardown/metrics support and template validation.
4+
---
5+
6+
# SpiceBench System Adapter Builder
7+
8+
Use this skill when implementing a new system adapter or customizing one of the templates in `system-adapters/templates`.
9+
10+
## What this skill builds
11+
12+
A JSON-RPC 2.0 adapter that supports both transports:
13+
14+
- stdio transport (line-delimited JSON-RPC messages)
15+
- HTTP transport (`POST /jsonrpc` by default)
16+
17+
Required methods:
18+
19+
- `setup(run_id, datasets)`
20+
- `query_method(run_id)`
21+
- `teardown(run_id)`
22+
- `metrics(run_id)`
23+
- `rpc.methods`
24+
25+
## Inputs to collect first
26+
27+
- Target language (`python`, `nodejs`, `rust`, `go`, or `java`)
28+
- SUT provisioning flow for `setup` and `teardown`
29+
- How to resolve query endpoint and credentials for `query_method`
30+
- Where to source metrics (cloud APIs, DB telemetry, host exporters)
31+
- Runtime/toolchain target (latest/LTS channel used by repository workflows)
32+
33+
## Build steps
34+
35+
1. Copy the nearest template from `system-adapters/templates/<language>`.
36+
2. Keep request/response envelopes JSON-RPC 2.0 compliant (`jsonrpc`, `id`, `method`, `params`).
37+
3. Implement `setup` and `teardown` with run-scoped resources keyed by `run_id`.
38+
4. Implement `query_method` to return:
39+
- `driver`: typically `flightsql` or `databricks`
40+
- `db_kwargs`: real endpoint + auth kwargs for the SUT
41+
5. Implement `metrics` to return both objects:
42+
- `resource`: CPU, memory, disk bytes, disk IOPS
43+
- `ingestion`: rows, bytes, rows/s, active connections
44+
6. Keep stdio and HTTP using the same dispatcher so behavior is identical.
45+
7. Return JSON-RPC errors with standard codes:
46+
- `-32700` parse error
47+
- `-32600` invalid request
48+
- `-32601` method not found
49+
- `-32602` invalid params
50+
- `-32603` internal error
51+
52+
## Metrics mapping checklist
53+
54+
Map live SUT metrics into these fields:
55+
56+
- `resource.cpu_usage_percent`
57+
- `resource.memory_usage_bytes`
58+
- `resource.disk_read_bytes`
59+
- `resource.disk_write_bytes`
60+
- `resource.disk_read_iops`
61+
- `resource.disk_write_iops`
62+
- `ingestion.rows_ingested`
63+
- `ingestion.bytes_ingested`
64+
- `ingestion.rows_per_sec`
65+
- `ingestion.active_connections`
66+
67+
If any metric is unavailable, return `0`/`0.0` and document why.
68+
69+
## Validation checklist
70+
71+
- Adapter responds to all required methods over stdio and HTTP.
72+
- `rpc.methods` includes every exposed method.
73+
- `query_method` returns a valid `driver` and complete `db_kwargs`.
74+
- `metrics` returns both `resource` and `ingestion` objects.
75+
- Language build/syntax checks pass:
76+
- Python: `python -m py_compile`
77+
- Node.js: `node --check`
78+
- Rust: `cargo build`
79+
- Go: `go build ./...`
80+
- Java: `mvn compile`
81+
82+
## Done criteria
83+
84+
The adapter is considered complete when:
85+
86+
- both transports work,
87+
- required methods are implemented,
88+
- metric fields are populated from real telemetry or documented stubs,
89+
- and template validation workflow passes in CI.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
name: Build Databricks System Adapter
2+
3+
on:
4+
workflow_dispatch:
5+
push:
6+
paths:
7+
- system-adapters/databricks/**
8+
- crates/system-adapter-protocol/**
9+
- .github/workflows/databricks_system_adapter_build.yml
10+
pull_request:
11+
paths:
12+
- system-adapters/databricks/**
13+
- crates/system-adapter-protocol/**
14+
- .github/workflows/databricks_system_adapter_build.yml
15+
16+
jobs:
17+
build:
18+
runs-on: ubuntu-latest
19+
timeout-minutes: 30
20+
steps:
21+
- uses: actions/checkout@v6
22+
23+
- name: Setup Rust toolchain
24+
uses: actions-rust-lang/setup-rust-toolchain@v1
25+
with:
26+
toolchain: 1.91
27+
cache: true
28+
29+
- name: Build databricks system adapter
30+
run: cargo build --manifest-path system-adapters/databricks/Cargo.toml
31+
32+
- name: Upload binary artifact
33+
uses: actions/upload-artifact@v4
34+
with:
35+
name: databricks-system-adapter-linux
36+
path: system-adapters/databricks/target/debug/databricks-system-adapter
37+
if-no-files-found: error

.github/workflows/pr.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,24 @@ jobs:
5555
- name: cargo test
5656
run: make test
5757

58+
fmt:
59+
name: cargo fmt check
60+
runs-on: spiceai-macos
61+
timeout-minutes: 60
62+
needs: changes
63+
if: needs.changes.outputs.rust == 'true'
64+
steps:
65+
- uses: actions/checkout@v6
66+
67+
- name: Setup Rust toolchain
68+
uses: actions-rust-lang/setup-rust-toolchain@v1
69+
with:
70+
toolchain: 1.91
71+
cache: false # Using GHA cache is slower than re-installing
72+
73+
- name: cargo fmt check
74+
run: make fmt-check
75+
5876
clippy:
5977
name: cargo clippy
6078
runs-on: spiceai-macos

.github/workflows/run_spicebench.yml

Lines changed: 145 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,18 @@ on:
55
workflow_dispatch:
66
inputs:
77
scenario:
8-
description: "Scenario/query set to run (e.g. tpch)"
8+
description: 'Scenario/query set to run (e.g. tpch)'
99
required: true
10-
default: "tpch"
10+
default: 'tpch'
1111
type: string
12+
system_adapter:
13+
description: 'System adapter to run (docker spidapter or local databricks adapter)'
14+
required: true
15+
default: spidapter
16+
type: choice
17+
options:
18+
- spidapter
19+
- databricks
1220
etl_bucket:
1321
description: 'S3 bucket for ETL source and target data'
1422
required: true
@@ -48,33 +56,145 @@ jobs:
4856
- uses: actions/checkout@v6
4957

5058
- uses: ./.github/actions/management-login
59+
if: ${{ github.event.inputs.system_adapter == 'spidapter' }}
5160
with:
5261
client-id: ${{ secrets.SPICE_MANAGEMENT_CLIENT_ID }}
5362
client-secret: ${{ secrets.SPICE_MANAGEMENT_CLIENT_SECRET }}
5463

5564
- name: Log in to GHCR
65+
if: ${{ github.event.inputs.system_adapter == 'spidapter' }}
5666
uses: docker/login-action@v3
5767
with:
5868
registry: ghcr.io
5969
username: ${{ github.actor }}
6070
password: ${{ secrets.GITHUB_TOKEN }}
6171

6272
- name: pull spidapter image
73+
if: ${{ github.event.inputs.system_adapter == 'spidapter' }}
6374
run: docker pull ghcr.io/spiceai/spidapter:latest
6475

6576
- uses: ./.github/actions/build-spicebench
6677

67-
- name: Install ADBC FlightSQL driver
78+
- name: Restore databricks adapter cache
79+
if: ${{ github.event.inputs.system_adapter == 'databricks' }}
80+
id: cache-databricks-adapter
81+
uses: actions/cache/restore@v4
82+
with:
83+
path: ~/.spice/bin/databricks-system-adapter
84+
key: databricks-system-adapter-${{ runner.os }}-${{ hashFiles('system-adapters/databricks/Cargo.toml', 'system-adapters/databricks/Cargo.lock', 'system-adapters/databricks/src/**/*.rs', 'crates/system-adapter-protocol/Cargo.toml', 'crates/system-adapter-protocol/src/**/*.rs') }}
85+
restore-keys: |
86+
databricks-system-adapter-${{ runner.os }}-
87+
88+
- name: Build databricks adapter
89+
if: ${{ github.event.inputs.system_adapter == 'databricks' && steps.cache-databricks-adapter.outputs.cache-hit != 'true' }}
90+
id: build-databricks-adapter
6891
run: |
92+
mkdir -p ~/.spice/bin
93+
cargo build --manifest-path system-adapters/databricks/Cargo.toml
94+
install -m 755 system-adapters/databricks/target/debug/databricks-system-adapter ~/.spice/bin/databricks-system-adapter
95+
96+
- name: Save databricks adapter cache
97+
if: ${{ github.event.inputs.system_adapter == 'databricks' && steps.build-databricks-adapter.outcome == 'success' }}
98+
uses: actions/cache/save@v4
99+
with:
100+
path: ~/.spice/bin/databricks-system-adapter
101+
key: databricks-system-adapter-${{ runner.os }}-${{ hashFiles('system-adapters/databricks/Cargo.toml', 'system-adapters/databricks/Cargo.lock', 'system-adapters/databricks/src/**/*.rs', 'crates/system-adapter-protocol/Cargo.toml', 'crates/system-adapter-protocol/src/**/*.rs') }}
102+
103+
- name: Validate adapter configuration
104+
env:
105+
SYSTEM_ADAPTER: ${{ github.event.inputs.system_adapter || 'spidapter' }}
106+
SCENARIO: ${{ github.event.inputs.scenario || 'tpch' }}
107+
SPICEAI_API_KEY: ${{ env.SPICEAI_API_KEY }}
108+
SPICE_CLOUD_API_URL: https://dev-api.spice.ai
109+
DATABRICKS_ENDPOINT: ${{ secrets.DATABRICKS_ENDPOINT }}
110+
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
111+
DATABRICKS_HTTP_PATH: ${{ secrets.DATABRICKS_HTTP_PATH }}
112+
DATABRICKS_SQL_WAREHOUSE_ID: ${{ secrets.DATABRICKS_SQL_WAREHOUSE_ID }}
113+
DATABRICKS_CATALOG: ${{ secrets.DATABRICKS_CATALOG }}
114+
DATABRICKS_SCHEMA: ${{ secrets.DATABRICKS_SCHEMA }}
115+
run: |
116+
set -euo pipefail
117+
118+
if [ -z "${SCENARIO}" ]; then
119+
echo "SCENARIO must not be empty"
120+
exit 1
121+
fi
122+
123+
case "${SYSTEM_ADAPTER}" in
124+
spidapter)
125+
if [ -z "${SPICEAI_API_KEY:-}" ]; then
126+
echo "SPICEAI_API_KEY must be set for spidapter"
127+
exit 1
128+
fi
129+
130+
if ! command -v docker >/dev/null 2>&1; then
131+
echo "docker is required for spidapter mode"
132+
exit 1
133+
fi
134+
135+
docker image inspect ghcr.io/spiceai/spidapter:latest >/dev/null 2>&1 || {
136+
echo "spidapter docker image not found locally; pull step may have failed"
137+
exit 1
138+
}
139+
;;
140+
141+
databricks)
142+
for required_var in DATABRICKS_ENDPOINT DATABRICKS_TOKEN DATABRICKS_HTTP_PATH DATABRICKS_SQL_WAREHOUSE_ID; do
143+
if [ -z "${!required_var:-}" ]; then
144+
echo "${required_var} must be set for databricks adapter mode"
145+
exit 1
146+
fi
147+
done
148+
149+
if echo "${DATABRICKS_ENDPOINT}" | grep -qE '^https?://'; then
150+
echo "DATABRICKS_ENDPOINT should be a hostname only (no http/https scheme)"
151+
exit 1
152+
fi
153+
154+
if echo "${DATABRICKS_HTTP_PATH}" | grep -qE '^/'; then
155+
echo "DATABRICKS_HTTP_PATH should not start with '/'"
156+
exit 1
157+
fi
158+
159+
if [ ! -x "${HOME}/.spice/bin/databricks-system-adapter" ]; then
160+
echo "Local databricks adapter binary is missing or not executable at ${HOME}/.spice/bin/databricks-system-adapter"
161+
exit 1
162+
fi
163+
164+
"${HOME}/.spice/bin/databricks-system-adapter" --help >/dev/null
165+
;;
166+
167+
*)
168+
echo "Unsupported system_adapter value: ${SYSTEM_ADAPTER}"
169+
exit 1
170+
;;
171+
esac
172+
- name: Install ADBC driver
173+
env:
174+
SYSTEM_ADAPTER: ${{ github.event.inputs.system_adapter || 'spidapter' }}
175+
run: |
176+
set -euo pipefail
69177
curl -LsSf https://dbc.columnar.tech/install.sh | sh
70-
dbc install flightsql
178+
179+
if [ "${SYSTEM_ADAPTER}" = "databricks" ]; then
180+
dbc install databricks
181+
else
182+
dbc install flightsql
183+
fi
71184
72185
- name: Run spicebench
73186
env:
74187
SPICEAI_API_KEY: ${{ env.SPICEAI_API_KEY }}
75188
SPICE_CLOUD_API_URL: https://dev-api.spice.ai
189+
DATABRICKS_ENDPOINT: ${{ secrets.DATABRICKS_ENDPOINT }}
190+
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
191+
DATABRICKS_HTTP_PATH: ${{ secrets.DATABRICKS_HTTP_PATH }}
192+
DATABRICKS_SQL_WAREHOUSE_ID: ${{ secrets.DATABRICKS_SQL_WAREHOUSE_ID }}
193+
DATABRICKS_CATALOG: ${{ secrets.DATABRICKS_CATALOG }}
194+
DATABRICKS_SCHEMA: ${{ secrets.DATABRICKS_SCHEMA }}
76195
SPICEAI_BENCHMARK_METRICS_KEY: ${{ secrets.SPICEAI_BENCHMARK_METRICS_KEY }}
77196
SCENARIO: ${{ github.event.inputs.scenario || 'tpch' }}
197+
SYSTEM_ADAPTER: ${{ github.event.inputs.system_adapter || 'spidapter' }}
78198
ETL_BUCKET: ${{ github.event.inputs.etl_bucket }}
79199
ETL_SOURCE_PREFIX: ${{ github.event.inputs.etl_source_prefix }}
80200
ETL_TARGET_PREFIX: ${{ github.event.inputs.etl_target_prefix }}
@@ -98,10 +218,29 @@ jobs:
98218
if [ -n "${ETL_ENDPOINT}" ]; then
99219
ETL_ARGS="${ETL_ARGS} --etl-endpoint ${ETL_ENDPOINT}"
100220
fi
221+
222+
if [ "${SYSTEM_ADAPTER}" = "databricks" ]; then
223+
ADAPTER_CMD="${HOME}/.spice/bin/databricks-system-adapter"
224+
ADAPTER_ARGS="stdio"
225+
ADAPTER_ENVS="--system-adapter-env DATABRICKS_ENDPOINT=${DATABRICKS_ENDPOINT} --system-adapter-env DATABRICKS_TOKEN=${DATABRICKS_TOKEN} --system-adapter-env DATABRICKS_HTTP_PATH=${DATABRICKS_HTTP_PATH} --system-adapter-env DATABRICKS_SQL_WAREHOUSE_ID=${DATABRICKS_SQL_WAREHOUSE_ID}"
226+
227+
if [ -n "${DATABRICKS_CATALOG}" ]; then
228+
ADAPTER_ENVS="${ADAPTER_ENVS} --system-adapter-env DATABRICKS_CATALOG=${DATABRICKS_CATALOG}"
229+
fi
230+
231+
if [ -n "${DATABRICKS_SCHEMA}" ]; then
232+
ADAPTER_ENVS="${ADAPTER_ENVS} --system-adapter-env DATABRICKS_SCHEMA=${DATABRICKS_SCHEMA}"
233+
fi
234+
else
235+
ADAPTER_CMD="docker"
236+
ADAPTER_ARGS="run -i -e SPICEAI_API_KEY -e SPICE_CLOUD_API_URL ghcr.io/spiceai/spidapter:latest stdio --verbose"
237+
ADAPTER_ENVS=""
238+
fi
101239
102240
~/.spice/bin/spicebench \
103241
--concurrency 2 \
104242
--scenario "${SCENARIO}" \
105243
${ETL_ARGS} \
106-
--system-adapter-stdio-cmd docker \
107-
--system-adapter-stdio-args "run -i -e SPICEAI_API_KEY -e SPICE_CLOUD_API_URL ghcr.io/spiceai/spidapter:latest stdio --verbose"
244+
--system-adapter-stdio-cmd "${ADAPTER_CMD}" \
245+
--system-adapter-stdio-args "${ADAPTER_ARGS}" \
246+
${ADAPTER_ENVS}

0 commit comments

Comments
 (0)