Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 63 additions & 4 deletions .github/workflows/tpcdsgen-conformance.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,23 @@ name: TPC-DS Conformance
on:
push:
branches: [ main, master ]
paths:
- 'tpcdsgen/**'
- '.github/**'
pull_request:
branches: [ main, master ]
paths:
- 'tpcdsgen/**'
- '.github/**'

env:
CARGO_TERM_COLOR: always
RUST_BACKTRACE: 1

jobs:
# Conformance testing against Java implementation
# Conformance testing against the Java / Trino reference implementation.
conformance-tests:
name: Conformance Tests
name: Conformance Tests (Java)
runs-on: ubuntu-latest

steps:
Expand Down Expand Up @@ -45,7 +51,7 @@ jobs:
- name: Bootstrap Java TPC-DS implementation
run: |
cd tpcdsgen
./scripts/bootstrap-java.sh
./scripts/bootstrap-trino.sh
Comment thread
alamb marked this conversation as resolved.

- name: Build Rust table generators
run: |
Expand All @@ -65,7 +71,60 @@ jobs:
if: failure() # Upload fixtures if tests fail for debugging
uses: actions/upload-artifact@v7
with:
name: test-fixtures
name: test-fixtures-trino
path: tpcdsgen/tests/fixtures/
retention-days: 7

# Conformance testing against the C dsdgen reference implementation.
#
# Reference data is pre-generated and lives in
# https://github.com/alamb/tpcds-data (branch sf1).
# `generate-fixtures.sh --compat c` clones it with --depth 1 and extracts
# into tpcdsgen/tests/fixtures/scale-1-c/. Rust is then run in
# --compat c mode and the .dat output is compared byte-for-byte (MD5/diff).
conformance-tests-c:
name: Conformance Tests (C dsdgen)
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v6

- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@stable

- name: Cache Rust dependencies
uses: actions/cache@v5
with:
path: |
~/.cargo/bin/
~/.cargo/registry/index/
~/.cargo/registry/cache/
~/.cargo/git/db/
target/
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
restore-keys: |
${{ runner.os }}-cargo-

- name: Download C dsdgen reference data
run: |
cd tpcdsgen
./scripts/generate-fixtures.sh --compat c --scale 1

- name: Build Rust table generators
run: |
cargo build --release -p tpcdsgen

- name: Run conformance tests (Rust --compat c vs C dsdgen)
run: |
cd tpcdsgen
./scripts/test-all-tables.sh --compat c

- name: Upload test fixtures as artifacts
if: failure()
uses: actions/upload-artifact@v7
with:
name: test-fixtures-c
path: tpcdsgen/tests/fixtures/
retention-days: 7

Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ target/
__old/
Cargo.lock
.idea
.venv/
.venv/
tpcds/
Empty file added .gitmodules
Empty file.
70 changes: 28 additions & 42 deletions tpcdsgen/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,75 +29,61 @@ Fixtures are pre-generated TPC-DS data files used for conformance testing.

```
tests/fixtures/
├── java/ # Java reference implementation output
│ ├── scale-1/ # 25 tables, ~1.2GB
│ └── scale-10/ # 25 tables, ~11GB
└── rust/ # Rust implementation output
├── scale-1/ # 25 tables, ~1.2GB
└── scale-10/ # 25 tables, ~11GB
├── scale-1-trino/ # Java reference fixtures (`--compat trino`)
├── scale-1-c/ # C dsdgen reference fixtures (`--compat c`)
└── scale-10-trino/ # higher scale factors as needed
```

### Generating Java Fixtures

Requires the Java TPC-DS implementation to be built:
### Conformance Testing

```bash
# Build Java implementation (if not already built)
cd ../tpcds && mvn clean package -DskipTests && cd -

# Generate Java fixtures for scale 1
java -jar ../tpcds/target/tpcds-1.5-SNAPSHOT-jar-with-dependencies.jar \
--scale 1 \
--directory tests/fixtures/java/scale-1 \
--overwrite

# Generate Java fixtures for scale 10
java -jar ../tpcds/target/tpcds-1.5-SNAPSHOT-jar-with-dependencies.jar \
--scale 10 \
--directory tests/fixtures/java/scale-10 \
--overwrite
```
`tpcdsgen` ships with two conformance suites, both implemented as shell
scripts that do byte-for-byte (MD5) comparison of `.dat` output. See
[scripts/README.md](scripts/README.md) for full details.

### Generating Rust Fixtures
**vs. Java / Trino reference (default, `--compat trino`):**

```bash
# Build Rust implementation
cargo build --release
# One-time: clone & build the Java TPC-DS implementation.
./scripts/bootstrap-trino.sh

# Generate Rust fixtures for scale 1
./target/release/tpcdsgen --scale 1 --directory tests/fixtures/rust/scale-1
# Generate Java reference fixtures into tests/fixtures/scale-N-trino/.
./scripts/generate-fixtures.sh

# Generate Rust fixtures for scale 10
./target/release/tpcdsgen --scale 10 --directory tests/fixtures/rust/scale-10
# Compare Rust output byte-for-byte against the Java fixtures.
./scripts/test-all-tables.sh --scale 1
```

### Conformance Testing

To verify Rust output matches Java byte-for-byte:
**vs. C dsdgen reference (`--compat c`):**

```bash
# Run conformance tests at scale 1
./scripts/test-all-tables.sh --scale 1
# One-time: download pre-generated C dsdgen data from
# https://github.com/alamb/tpcds-data into tests/fixtures/scale-N-c/.
./scripts/generate-fixtures.sh --compat c --scale 1

# Run conformance tests at scale 10
./scripts/test-all-tables.sh --scale 10
# Compare Rust --compat c output byte-for-byte against the C fixtures.
./scripts/test-all-tables.sh --compat c --scale 1
```

See [HASHES.md](HASHES.md) for the canonical MD5 hashes.
Both suites also support comparing a single table:

```bash
./scripts/compare-table.sh reason # vs. Java
./scripts/compare-table.sh reason --compat c # vs. C dsdgen
```

### Verifying Fixtures with MD5SUMS

Each fixture directory contains an `MD5SUMS` file for verification.

**On Linux:**
```bash
cd tests/fixtures/java/scale-1
cd tests/fixtures/scale-1-trino
md5sum -c MD5SUMS
```

**On macOS:**
```bash
cd tests/fixtures/java/scale-1
cd tests/fixtures/scale-1-trino
while read hash file; do
[[ $(md5 -q "$file") == "$hash" ]] && echo "$file: OK" || echo "$file: FAILED"
done < MD5SUMS
Expand Down
82 changes: 82 additions & 0 deletions tpcdsgen/data/return_reasons_c.dst
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
------
-- return_reasons
------
-- values weights
-- -----------------------
-- 1. reason 1-6. not sure... none are ever used
------
Package was damaged: 1, 0, 0, 0, 0, 0
Stopped working: 1, 0, 0, 0, 0, 0
Did not get it on time: 1, 0, 0, 0, 0, 0
Not the product that was ordred: 1, 0, 0, 0, 0, 0
Parts missing: 1, 0, 0, 0, 0, 0
Does not work with a product that I have: 1, 0, 0, 0, 0, 0
Gift exchange: 1, 0, 0, 0, 0, 0
Did not like the color: 1, 0, 0, 0, 0, 0
Did not like the model: 1, 0, 0, 0, 0, 0
Did not like the make: 1, 0, 0, 0, 0, 0
Did not like the warranty: 1, 0, 0, 0, 0, 0
No service location in my area: 1, 0, 0, 0, 0, 0
Found a better price in a store: 1, 0, 0, 0, 0, 0
Found a better extended warranty in a store: 1, 0, 0, 0, 0, 0
Not working any more: 1, 0, 0, 0, 0, 0
Did not fit: 1, 0, 0, 0, 0, 0
Wrong size: 1, 0, 0, 0, 0, 0
Lost my job: 1, 0, 0, 0, 0, 0
unauthoized purchase: 1, 0, 0, 0, 0, 0
duplicate purchase: 1, 0, 0, 0, 0, 0
its is a boy: 1, 0, 0, 0, 0, 0
it is a girl: 1, 0, 0, 0, 0, 0
reason 23: 1, 0, 0, 0, 0, 0
reason 24: 1, 0, 0, 0, 0, 0
reason 25: 1, 0, 0, 0, 0, 0
reason 26: 1, 0, 0, 0, 0, 0
reason 27: 1, 0, 0, 0, 0, 0
reason 28: 1, 0, 0, 0, 0, 0
reason 29: 1, 0, 0, 0, 0, 0
reason 30: 1, 0, 0, 0, 0, 0
reason 31: 1, 0, 0, 0, 0, 0
reason 32: 1, 0, 0, 0, 0, 0
reason 33: 1, 0, 0, 0, 0, 0
reason 34: 1, 0, 0, 0, 0, 0
reason 35: 1, 0, 0, 0, 0, 0
reason 36: 1, 1, 0, 0, 0, 0
reason 37: 1, 1, 0, 0, 0, 0
reason 38: 1, 1, 0, 0, 0, 0
reason 39: 1, 1, 0, 0, 0, 0
reason 40: 1, 1, 0, 0, 0, 0
reason 41: 1, 1, 0, 0, 0, 0
reason 42: 1, 1, 0, 0, 0, 0
reason 43: 1, 1, 0, 0, 0, 0
reason 44: 1, 1, 0, 0, 0, 0
reason 45: 1, 1, 0, 0, 0, 0
reason 46: 1, 1, 1, 0, 0, 0
reason 47: 1, 1, 1, 0, 0, 0
reason 48: 1, 1, 1, 0, 0, 0
reason 49: 1, 1, 1, 0, 0, 0
reason 50: 1, 1, 1, 0, 0, 0
reason 51: 1, 1, 1, 0, 0, 0
reason 52: 1, 1, 1, 0, 0, 0
reason 53: 1, 1, 1, 0, 0, 0
reason 54: 1, 1, 1, 0, 0, 0
reason 55: 1, 1, 1, 0, 0, 0
reason 56: 1, 1, 1, 1, 0, 0
reason 57: 1, 1, 1, 1, 0, 0
reason 58: 1, 1, 1, 1, 0, 0
reason 59: 1, 1, 1, 1, 0, 0
reason 60: 1, 1, 1, 1, 0, 0
reason 61: 1, 1, 1, 1, 0, 0
reason 62: 1, 1, 1, 1, 0, 0
reason 63: 1, 1, 1, 1, 0, 0
reason 64: 1, 1, 1, 1, 0, 0
reason 65: 1, 1, 1, 1, 0, 0
reason 66: 1, 1, 1, 1, 1, 0
reason 67: 1, 1, 1, 1, 1, 0
reason 68: 1, 1, 1, 1, 1, 0
reason 69: 1, 1, 1, 1, 1, 0
reason 70: 1, 1, 1, 1, 1, 0
reason 71: 1, 1, 1, 1, 1, 1
reason 72: 1, 1, 1, 1, 1, 1
reason 73: 1, 1, 1, 1, 1, 1
reason 74: 1, 1, 1, 1, 1, 1
reason 75: 1, 1, 1, 1, 1, 1
Loading
Loading