Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 62 additions & 3 deletions .github/workflows/tpcdsgen-conformance.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,23 @@ name: TPC-DS Conformance
on:
push:
branches: [ main, master ]
paths:
- 'tpcdsgen/**'
- '.github/**'
pull_request:
branches: [ main, master ]
paths:
- 'tpcdsgen/**'
- '.github/**'

env:
CARGO_TERM_COLOR: always
RUST_BACKTRACE: 1

jobs:
# Conformance testing against Java implementation
# Conformance testing against the Java / Trino reference implementation.
conformance-tests:
name: Conformance Tests
name: Conformance Tests (Java)
runs-on: ubuntu-latest

steps:
Expand Down Expand Up @@ -65,7 +71,60 @@ jobs:
if: failure() # Upload fixtures if tests fail for debugging
uses: actions/upload-artifact@v7
with:
name: test-fixtures
name: test-fixtures-java
path: tpcdsgen/tests/fixtures/
retention-days: 7

# Conformance testing against the C dsdgen reference implementation.
#
# Reference data is pre-generated and lives in
# https://github.com/alamb/tpcds-data (branch sf1).
# `generate-fixtures.sh --compat c` clones it with --depth 1 and extracts
# into tpcdsgen/tests/fixtures/scale-1-c/. Rust is then run in
# --compat c mode and the .dat output is compared byte-for-byte (MD5/diff).
conformance-tests-c:
name: Conformance Tests (C dsdgen)
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v6

- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@stable

- name: Cache Rust dependencies
uses: actions/cache@v5
with:
path: |
~/.cargo/bin/
~/.cargo/registry/index/
~/.cargo/registry/cache/
~/.cargo/git/db/
target/
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
restore-keys: |
${{ runner.os }}-cargo-

- name: Download C dsdgen reference data
run: |
cd tpcdsgen
./scripts/generate-fixtures.sh --compat c --scale 1

- name: Build Rust table generators
run: |
cargo build --release -p tpcdsgen

- name: Run conformance tests (Rust --compat c vs C dsdgen)
run: |
cd tpcdsgen
./scripts/test-all-tables.sh --compat c

- name: Upload test fixtures as artifacts
if: failure()
uses: actions/upload-artifact@v7
with:
name: test-fixtures-c
path: tpcdsgen/tests/fixtures/
retention-days: 7

Expand Down
3 changes: 0 additions & 3 deletions tpcdsgen/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,6 @@
# Test fixtures (generated).
#/tests/fixtures/

# Python cache.
scripts/__pycache__/

# Stuff I need to remember
NEXT_STEPS.md
ISSUES.md
1 change: 0 additions & 1 deletion tpcdsgen/.python-version

This file was deleted.

70 changes: 28 additions & 42 deletions tpcdsgen/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,75 +29,61 @@ Fixtures are pre-generated TPC-DS data files used for conformance testing.

```
tests/fixtures/
├── java/ # Java reference implementation output
│ ├── scale-1/ # 25 tables, ~1.2GB
│ └── scale-10/ # 25 tables, ~11GB
└── rust/ # Rust implementation output
├── scale-1/ # 25 tables, ~1.2GB
└── scale-10/ # 25 tables, ~11GB
├── scale-1-java/ # Java reference fixtures (`--compat trino`)
├── scale-1-c/ # C dsdgen reference fixtures (`--compat c`)
└── scale-10-java/ # higher scale factors as needed
```

### Generating Java Fixtures

Requires the Java TPC-DS implementation to be built:
### Conformance Testing

```bash
# Build Java implementation (if not already built)
cd ../tpcds && mvn clean package -DskipTests && cd -

# Generate Java fixtures for scale 1
java -jar ../tpcds/target/tpcds-1.5-SNAPSHOT-jar-with-dependencies.jar \
--scale 1 \
--directory tests/fixtures/java/scale-1 \
--overwrite

# Generate Java fixtures for scale 10
java -jar ../tpcds/target/tpcds-1.5-SNAPSHOT-jar-with-dependencies.jar \
--scale 10 \
--directory tests/fixtures/java/scale-10 \
--overwrite
```
`tpcdsgen` ships with two conformance suites, both implemented as shell
scripts that do byte-for-byte (MD5) comparison of `.dat` output. See
[scripts/README.md](scripts/README.md) for full details.

### Generating Rust Fixtures
**vs. Java / Trino reference (default, `--compat trino`):**

```bash
# Build Rust implementation
cargo build --release
# One-time: clone & build the Java TPC-DS implementation.
./scripts/bootstrap-java.sh

# Generate Rust fixtures for scale 1
./target/release/tpcdsgen --scale 1 --directory tests/fixtures/rust/scale-1
# Generate Java reference fixtures into tests/fixtures/scale-N-java/.
./scripts/generate-fixtures.sh

# Generate Rust fixtures for scale 10
./target/release/tpcdsgen --scale 10 --directory tests/fixtures/rust/scale-10
# Compare Rust output byte-for-byte against the Java fixtures.
./scripts/test-all-tables.sh --scale 1
```

### Conformance Testing

To verify Rust output matches Java byte-for-byte:
**vs. C dsdgen reference (`--compat c`):**

```bash
# Run conformance tests at scale 1
./scripts/test-all-tables.sh --scale 1
# One-time: download pre-generated C dsdgen data from
# https://github.com/alamb/tpcds-data into tests/fixtures/scale-N-c/.
./scripts/generate-fixtures.sh --compat c --scale 1

# Run conformance tests at scale 10
./scripts/test-all-tables.sh --scale 10
# Compare Rust --compat c output byte-for-byte against the C fixtures.
./scripts/test-all-tables.sh --compat c --scale 1
```

See [HASHES.md](HASHES.md) for the canonical MD5 hashes.
Both suites also support comparing a single table:

```bash
./scripts/compare-table.sh reason # vs. Java
./scripts/compare-table.sh reason --compat c # vs. C dsdgen
```

### Verifying Fixtures with MD5SUMS

Each fixture directory contains an `MD5SUMS` file for verification.

**On Linux:**
```bash
cd tests/fixtures/java/scale-1
cd tests/fixtures/scale-1-java
md5sum -c MD5SUMS
```

**On macOS:**
```bash
cd tests/fixtures/java/scale-1
cd tests/fixtures/scale-1-java
while read hash file; do
[[ $(md5 -q "$file") == "$hash" ]] && echo "$file: OK" || echo "$file: FAILED"
done < MD5SUMS
Expand Down
6 changes: 0 additions & 6 deletions tpcdsgen/main.py

This file was deleted.

10 changes: 0 additions & 10 deletions tpcdsgen/pyproject.toml

This file was deleted.

Loading
Loading