Skip to content

Commit 6eda558

Browse files
committed
feat: dsdgen compatibility
1 parent 31d515c commit 6eda558

24 files changed

Lines changed: 1026 additions & 40 deletions

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@ target/
33
__old/
44
Cargo.lock
55
.idea
6-
.venv/
6+
.venv/
7+
tpcds/

.gitmodules

Whitespace-only changes.

tpcdsgen/.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@
88
# Test fixtures (generated).
99
#/tests/fixtures/
1010

11+
# Python cache.
12+
scripts/__pycache__/
13+
1114
# Stuff I need to remember
1215
NEXT_STEPS.md
1316
ISSUES.md

tpcdsgen/.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.14

tpcdsgen/data/return_reasons_c.dst

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
------
2+
-- return_reasons
3+
------
4+
-- values weights
5+
-- -----------------------
6+
-- 1. reason 1-6. not sure... none are ever used
7+
------
8+
Package was damaged: 1, 0, 0, 0, 0, 0
9+
Stopped working: 1, 0, 0, 0, 0, 0
10+
Did not get it on time: 1, 0, 0, 0, 0, 0
11+
Not the product that was ordred: 1, 0, 0, 0, 0, 0
12+
Parts missing: 1, 0, 0, 0, 0, 0
13+
Does not work with a product that I have: 1, 0, 0, 0, 0, 0
14+
Gift exchange: 1, 0, 0, 0, 0, 0
15+
Did not like the color: 1, 0, 0, 0, 0, 0
16+
Did not like the model: 1, 0, 0, 0, 0, 0
17+
Did not like the make: 1, 0, 0, 0, 0, 0
18+
Did not like the warranty: 1, 0, 0, 0, 0, 0
19+
No service location in my area: 1, 0, 0, 0, 0, 0
20+
Found a better price in a store: 1, 0, 0, 0, 0, 0
21+
Found a better extended warranty in a store: 1, 0, 0, 0, 0, 0
22+
Not working any more: 1, 0, 0, 0, 0, 0
23+
Did not fit: 1, 0, 0, 0, 0, 0
24+
Wrong size: 1, 0, 0, 0, 0, 0
25+
Lost my job: 1, 0, 0, 0, 0, 0
26+
unauthoized purchase: 1, 0, 0, 0, 0, 0
27+
duplicate purchase: 1, 0, 0, 0, 0, 0
28+
its is a boy: 1, 0, 0, 0, 0, 0
29+
it is a girl: 1, 0, 0, 0, 0, 0
30+
reason 23: 1, 0, 0, 0, 0, 0
31+
reason 24: 1, 0, 0, 0, 0, 0
32+
reason 25: 1, 0, 0, 0, 0, 0
33+
reason 26: 1, 0, 0, 0, 0, 0
34+
reason 27: 1, 0, 0, 0, 0, 0
35+
reason 28: 1, 0, 0, 0, 0, 0
36+
reason 29: 1, 0, 0, 0, 0, 0
37+
reason 30: 1, 0, 0, 0, 0, 0
38+
reason 31: 1, 0, 0, 0, 0, 0
39+
reason 32: 1, 0, 0, 0, 0, 0
40+
reason 33: 1, 0, 0, 0, 0, 0
41+
reason 34: 1, 0, 0, 0, 0, 0
42+
reason 35: 1, 0, 0, 0, 0, 0
43+
reason 36: 1, 1, 0, 0, 0, 0
44+
reason 37: 1, 1, 0, 0, 0, 0
45+
reason 38: 1, 1, 0, 0, 0, 0
46+
reason 39: 1, 1, 0, 0, 0, 0
47+
reason 40: 1, 1, 0, 0, 0, 0
48+
reason 41: 1, 1, 0, 0, 0, 0
49+
reason 42: 1, 1, 0, 0, 0, 0
50+
reason 43: 1, 1, 0, 0, 0, 0
51+
reason 44: 1, 1, 0, 0, 0, 0
52+
reason 45: 1, 1, 0, 0, 0, 0
53+
reason 46: 1, 1, 1, 0, 0, 0
54+
reason 47: 1, 1, 1, 0, 0, 0
55+
reason 48: 1, 1, 1, 0, 0, 0
56+
reason 49: 1, 1, 1, 0, 0, 0
57+
reason 50: 1, 1, 1, 0, 0, 0
58+
reason 51: 1, 1, 1, 0, 0, 0
59+
reason 52: 1, 1, 1, 0, 0, 0
60+
reason 53: 1, 1, 1, 0, 0, 0
61+
reason 54: 1, 1, 1, 0, 0, 0
62+
reason 55: 1, 1, 1, 0, 0, 0
63+
reason 56: 1, 1, 1, 1, 0, 0
64+
reason 57: 1, 1, 1, 1, 0, 0
65+
reason 58: 1, 1, 1, 1, 0, 0
66+
reason 59: 1, 1, 1, 1, 0, 0
67+
reason 60: 1, 1, 1, 1, 0, 0
68+
reason 61: 1, 1, 1, 1, 0, 0
69+
reason 62: 1, 1, 1, 1, 0, 0
70+
reason 63: 1, 1, 1, 1, 0, 0
71+
reason 64: 1, 1, 1, 1, 0, 0
72+
reason 65: 1, 1, 1, 1, 0, 0
73+
reason 66: 1, 1, 1, 1, 1, 0
74+
reason 67: 1, 1, 1, 1, 1, 0
75+
reason 68: 1, 1, 1, 1, 1, 0
76+
reason 69: 1, 1, 1, 1, 1, 0
77+
reason 70: 1, 1, 1, 1, 1, 0
78+
reason 71: 1, 1, 1, 1, 1, 1
79+
reason 72: 1, 1, 1, 1, 1, 1
80+
reason 73: 1, 1, 1, 1, 1, 1
81+
reason 74: 1, 1, 1, 1, 1, 1
82+
reason 75: 1, 1, 1, 1, 1, 1

tpcdsgen/main.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
def main():
2+
print("Hello from tpcdsgen!")
3+
4+
5+
if __name__ == "__main__":
6+
main()

tpcdsgen/pyproject.toml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
[project]
2+
name = "tpcdsgen"
3+
version = "0.1.0"
4+
description = "Add your description here"
5+
readme = "README.md"
6+
requires-python = ">=3.14"
7+
dependencies = [
8+
"datafusion>=53.0.0",
9+
"pyarrow>=24.0.0",
10+
]

tpcdsgen/scripts/README.md

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ Compares Rust-generated output for a single table against the Java reference fix
161161
[INFO] Comparing outputs...
162162
[INFO] Java fixture: 6 rows, 4.0K
163163
[INFO] Rust output: 6 rows, 4.0K
164-
[SUCCESS] ✓ call_center: Outputs match exactly (6 rows)
164+
[SUCCESS] ✓ call_center: MD5 match (6 rows, cc9aabc63eb8603bd7330b6735ed0961)
165165
[INFO] =========================================
166166
```
167167

@@ -210,7 +210,7 @@ Runs comparison tests for all tables that have been ported to Rust. This is the
210210
211211
[INFO] Testing: call_center
212212
...
213-
[SUCCESS] ✓ call_center: Outputs match exactly (6 rows)
213+
[SUCCESS] ✓ call_center: MD5 match (6 rows, cc9aabc63eb8603bd7330b6735ed0961)
214214
...
215215
216216
[INFO] =========================================
@@ -324,15 +324,6 @@ These scripts are designed to be CI-friendly:
324324
325325
Exit codes make it easy to fail CI on mismatches.
326326
327-
---
328-
329-
## TODOs
330-
331-
- [ ] Support multiple scale factors (scale-10, scale-100)
332-
- [ ] MD5 hash validation (faster than full diff for large tables)
333-
334-
---
335-
336327
## Notes
337328
338329
- **Fixtures are gitignored** - They're generated artifacts, not source code

tpcdsgen/scripts/benchmark.sh

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1-
#!/bin/bash
1+
#!/usr/bin/env bash
22
#
33
# TPC-DS Benchmark Script
44
# Measures generation time for all tables at scale factors 1, 10, and 100
55
#
66
# Usage: ./scripts/benchmark.sh [--no-output] [--json] [--scales "1 10 100"]
77
#
88

9-
set -e
9+
set -euo pipefail
1010

1111
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
1212
PROJECT_DIR="$(dirname "$SCRIPT_DIR")"
@@ -84,9 +84,6 @@ else
8484
CLEANUP_DIR=false
8585
fi
8686

87-
# Results file for comparison
88-
RESULTS_FILE="$PROJECT_DIR/benchmark_results_$(date +%Y%m%d_%H%M%S).txt"
89-
9087
echo "=========================================="
9188
echo "TPC-DS Rust Generator Benchmark"
9289
echo "=========================================="

tpcdsgen/scripts/bootstrap-java.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ find_java_jar() {
109109
local jar_pattern="$JAVA_DIR/target/tpcds-*-jar-with-dependencies.jar"
110110
local jar_file
111111

112-
jar_file=$(ls $jar_pattern 2>/dev/null | head -1)
112+
jar_file=$(find "$JAVA_DIR/target" -name "tpcds-*-jar-with-dependencies.jar" 2>/dev/null | head -1)
113113

114114
if [[ -z "$jar_file" ]]; then
115115
return 1
@@ -132,7 +132,7 @@ clone_java_repo() {
132132
if [[ -d "$JAVA_DIR/.git" ]]; then
133133
log_info "Existing git repository found, pulling latest changes..."
134134
cd "$JAVA_DIR"
135-
git pull origin master || log_warn "Failed to pull latest changes"
135+
git pull || log_warn "Failed to pull latest changes"
136136
cd - >/dev/null
137137
return 0
138138
else

0 commit comments

Comments
 (0)