Skip to content

Commit d23cde0

Browse files
lukekimclaude
andauthored
Add missing cayenne/duckdb[file] benchmark coverage (spiceai#10808)
* Add missing cayenne/duckdb[file] benchmark coverage Fill gaps in testoperator benchmark coverage for cayenne and duckdb[file] accelerators at SF1 and SF100: - Add TPCDS SF100 s3[parquet]-duckdb[file] spicepod and dispatch (TPCH SF100 had both; TPCDS SF100 only had cayenne) - Wire TPCDS SF100 into the scheduled and workflow_dispatch testoperator dispatch jobs (validation + bench dispatch) - Add ClickBench SF1 on_zero_results cayenne[file] spicepod and dispatch (TPCH and TPCDS had both engines; ClickBench only had duckdb) https://claude.ai/code/session_01NtBHhPp31i5hhWWj5GiqeA * Address review: query_overrides + globstar in spicepod validation - Add `query_overrides: duckdb` to the new ClickBench cayenne[file] on_zero_results dispatch, matching the other ClickBench accelerated dispatch configs (ClickBench queries use DuckDB SQL dialect). - Enable `shopt -s globstar nullglob` before each spicepod validation loop. Default GitHub Actions bash does not enable globstar, so `**/*.yaml` was not recursive and nested directories (append/, indexes/, on_zero_results/) were skipped. https://claude.ai/code/session_01NtBHhPp31i5hhWWj5GiqeA * Remove duckdb query_overrides from ClickBench cayenne on_zero_results The `duckdb` override casts binary columns to TEXT as a workaround for the DataFusion unparser and was originally intended for the DuckDB accelerator with on_zero_results: use_source. It is not appropriate for the Cayenne accelerator, which has its own storage format. https://claude.ai/code/session_01NtBHhPp31i5hhWWj5GiqeA --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent b38219f commit d23cde0

5 files changed

Lines changed: 177 additions & 0 deletions

File tree

.github/workflows/testoperator_dispatch.yml

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,32 +93,44 @@ jobs:
9393

9494
- name: Validate spicepods - TPCH - SF1
9595
run: |
96+
shopt -s globstar nullglob
9697
for file in ./test/spicepods/tpch/sf1/**/*.yaml; do
9798
echo "Validating $file"
9899
"$SPICEPOD_VALIDATOR" "$file"
99100
done
100101
101102
- name: Validate spicepods - TPCDS - SF1
102103
run: |
104+
shopt -s globstar nullglob
103105
for file in ./test/spicepods/tpcds/sf1/**/*.yaml; do
104106
echo "Validating $file"
105107
"$SPICEPOD_VALIDATOR" "$file"
106108
done
107109
108110
- name: Validate spicepods - ClickBench - SF1
109111
run: |
112+
shopt -s globstar nullglob
110113
for file in ./test/spicepods/clickbench/sf1/**/*.yaml; do
111114
echo "Validating $file"
112115
"$SPICEPOD_VALIDATOR" "$file"
113116
done
114117
115118
- name: Validate spicepods - TPCH - SF100
116119
run: |
120+
shopt -s globstar nullglob
117121
for file in ./test/spicepods/tpch/sf100/**/*.yaml; do
118122
echo "Validating $file"
119123
"$SPICEPOD_VALIDATOR" "$file"
120124
done
121125
126+
- name: Validate spicepods - TPCDS - SF100
127+
run: |
128+
shopt -s globstar nullglob
129+
for file in ./test/spicepods/tpcds/sf100/**/*.yaml; do
130+
echo "Validating $file"
131+
"$SPICEPOD_VALIDATOR" "$file"
132+
done
133+
122134
- name: Dispatch Testoperator - Scheduled Bench - TPCH - SF1
123135
run: |
124136
testoperator dispatch ./tools/testoperator/dispatch/tpch/sf1 --workflow bench
@@ -159,6 +171,14 @@ jobs:
159171
SPICED_COMMIT: ${{ steps.setup-spiced.outputs.SPICED_COMMIT }}
160172
WORKFLOW_COMMIT: ${{ matrix.branch }}
161173

174+
- name: Dispatch Testoperator - Scheduled Bench - TPCDS - SF100
175+
run: |
176+
testoperator dispatch ./tools/testoperator/dispatch/tpcds/sf100 --workflow bench
177+
env:
178+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
179+
SPICED_COMMIT: ${{ steps.setup-spiced.outputs.SPICED_COMMIT }}
180+
WORKFLOW_COMMIT: ${{ matrix.branch }}
181+
162182
- name: Dispatch Testoperator - Scheduled Streaming - Bench
163183
run: |
164184
testoperator dispatch ./tools/testoperator/dispatch/streaming --workflow streaming-bench
@@ -224,32 +244,44 @@ jobs:
224244

225245
- name: Validate spicepods - TPCH - SF1
226246
run: |
247+
shopt -s globstar nullglob
227248
for file in ./test/spicepods/tpch/sf1/**/*.yaml; do
228249
echo "Validating $file"
229250
"$SPICEPOD_VALIDATOR" "$file"
230251
done
231252
232253
- name: Validate spicepods - TPCDS - SF1
233254
run: |
255+
shopt -s globstar nullglob
234256
for file in ./test/spicepods/tpcds/sf1/**/*.yaml; do
235257
echo "Validating $file"
236258
"$SPICEPOD_VALIDATOR" "$file"
237259
done
238260
239261
- name: Validate spicepods - ClickBench - SF1
240262
run: |
263+
shopt -s globstar nullglob
241264
for file in ./test/spicepods/clickbench/sf1/**/*.yaml; do
242265
echo "Validating $file"
243266
"$SPICEPOD_VALIDATOR" "$file"
244267
done
245268
246269
- name: Validate spicepods - TPCH - SF100
247270
run: |
271+
shopt -s globstar nullglob
248272
for file in ./test/spicepods/tpch/sf100/**/*.yaml; do
249273
echo "Validating $file"
250274
"$SPICEPOD_VALIDATOR" "$file"
251275
done
252276
277+
- name: Validate spicepods - TPCDS - SF100
278+
run: |
279+
shopt -s globstar nullglob
280+
for file in ./test/spicepods/tpcds/sf100/**/*.yaml; do
281+
echo "Validating $file"
282+
"$SPICEPOD_VALIDATOR" "$file"
283+
done
284+
253285
- name: Dispatch Testoperator - ${{ github.event.inputs.workflow_type }} - TPCH - SF1
254286
run: |
255287
testoperator dispatch ./tools/testoperator/dispatch/tpch/sf1 \
@@ -290,6 +322,16 @@ jobs:
290322
SPICED_COMMIT: ${{ steps.setup-spiced.outputs.SPICED_COMMIT }}
291323
WORKFLOW_COMMIT: ${{ github.event.ref }}
292324

325+
- name: Dispatch Testoperator - ${{ github.event.inputs.workflow_type }} - TPCDS - SF100
326+
run: |
327+
testoperator dispatch ./tools/testoperator/dispatch/tpcds/sf100 \
328+
--workflow ${{ github.event.inputs.workflow_type }} \
329+
--update-snapshots ${{ github.event.inputs.update_snapshots }}
330+
env:
331+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
332+
SPICED_COMMIT: ${{ steps.setup-spiced.outputs.SPICED_COMMIT }}
333+
WORKFLOW_COMMIT: ${{ github.event.ref }}
334+
293335
- name: Dispatch Testoperator - ${{ github.event.inputs.workflow_type }} - Text to SQL
294336
run: |
295337
testoperator dispatch ./tools/testoperator/dispatch/text_to_sql \
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
version: v1
2+
kind: Spicepod
3+
name: file[parquet]-cayenne[file]-on_zero_results_small
4+
datasets:
5+
- from: file:data/hits_0.parquet
6+
name: hits
7+
acceleration:
8+
enabled: true
9+
engine: cayenne
10+
mode: file
11+
refresh_sql: "SELECT * FROM hits LIMIT 0"
12+
on_zero_results: use_source
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
version: v1
2+
kind: Spicepod
3+
name: s3[parquet]-duckdb[file]
4+
datasets:
5+
- from: s3://benchmarks/tpcds_sf100/catalog_sales/
6+
name: catalog_sales
7+
params: &s3_params
8+
file_format: parquet
9+
allow_http: true
10+
s3_auth: key
11+
s3_endpoint: ${secrets:S3_ENDPOINT}
12+
s3_key: ${secrets:S3_KEY}
13+
s3_secret: ${secrets:S3_SECRET}
14+
acceleration: &acceleration
15+
enabled: true
16+
engine: duckdb
17+
mode: file
18+
- from: s3://benchmarks/tpcds_sf100/catalog_returns/
19+
name: catalog_returns
20+
params: *s3_params
21+
acceleration: *acceleration
22+
- from: s3://benchmarks/tpcds_sf100/inventory/
23+
name: inventory
24+
params: *s3_params
25+
acceleration: *acceleration
26+
- from: s3://benchmarks/tpcds_sf100/store_sales/
27+
name: store_sales
28+
params: *s3_params
29+
acceleration: *acceleration
30+
- from: s3://benchmarks/tpcds_sf100/store_returns/
31+
name: store_returns
32+
params: *s3_params
33+
acceleration: *acceleration
34+
- from: s3://benchmarks/tpcds_sf100/web_sales/
35+
name: web_sales
36+
params: *s3_params
37+
acceleration: *acceleration
38+
- from: s3://benchmarks/tpcds_sf100/web_returns/
39+
name: web_returns
40+
params: *s3_params
41+
acceleration: *acceleration
42+
- from: s3://benchmarks/tpcds_sf100/customer/
43+
name: customer
44+
params: *s3_params
45+
acceleration: *acceleration
46+
- from: s3://benchmarks/tpcds_sf100/customer_address/
47+
name: customer_address
48+
params: *s3_params
49+
acceleration: *acceleration
50+
- from: s3://benchmarks/tpcds_sf100/customer_demographics/
51+
name: customer_demographics
52+
params: *s3_params
53+
acceleration: *acceleration
54+
- from: s3://benchmarks/tpcds_sf100/date_dim/
55+
name: date_dim
56+
params: *s3_params
57+
acceleration: *acceleration
58+
- from: s3://benchmarks/tpcds_sf100/household_demographics/
59+
name: household_demographics
60+
params: *s3_params
61+
acceleration: *acceleration
62+
- from: s3://benchmarks/tpcds_sf100/item/
63+
name: item
64+
params: *s3_params
65+
acceleration: *acceleration
66+
- from: s3://benchmarks/tpcds_sf100/promotion/
67+
name: promotion
68+
params: *s3_params
69+
acceleration: *acceleration
70+
- from: s3://benchmarks/tpcds_sf100/ship_mode/
71+
name: ship_mode
72+
params: *s3_params
73+
acceleration: *acceleration
74+
- from: s3://benchmarks/tpcds_sf100/store/
75+
name: store
76+
params: *s3_params
77+
acceleration: *acceleration
78+
- from: s3://benchmarks/tpcds_sf100/time_dim/
79+
name: time_dim
80+
params: *s3_params
81+
acceleration: *acceleration
82+
- from: s3://benchmarks/tpcds_sf100/warehouse/
83+
name: warehouse
84+
params: *s3_params
85+
acceleration: *acceleration
86+
- from: s3://benchmarks/tpcds_sf100/web_page/
87+
name: web_page
88+
params: *s3_params
89+
acceleration: *acceleration
90+
- from: s3://benchmarks/tpcds_sf100/web_site/
91+
name: web_site
92+
params: *s3_params
93+
acceleration: *acceleration
94+
- from: s3://benchmarks/tpcds_sf100/reason/
95+
name: reason
96+
params: *s3_params
97+
acceleration: *acceleration
98+
- from: s3://benchmarks/tpcds_sf100/call_center/
99+
name: call_center
100+
params: *s3_params
101+
acceleration: *acceleration
102+
- from: s3://benchmarks/tpcds_sf100/income_band/
103+
name: income_band
104+
params: *s3_params
105+
acceleration: *acceleration
106+
- from: s3://benchmarks/tpcds_sf100/catalog_page/
107+
name: catalog_page
108+
params: *s3_params
109+
acceleration: *acceleration
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
tests:
2+
bench:
3+
spicepod_path: accelerated/on_zero_results/file[parquet]-cayenne[file]-on_zero_results_small.yaml
4+
query_set: clickbench
5+
runner_type: spiceai-dev-large-runners
6+
ready_wait: 1800
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
tests:
2+
bench:
3+
spicepod_path: accelerated/s3[parquet]-duckdb[file].yaml
4+
query_set: tpcds
5+
query_overrides: duckdb
6+
runner_type: spiceai-dev-large-runners
7+
ready_wait: 6000
8+
scale_factor: 100

0 commit comments

Comments
 (0)