This directory contains scripts for testing the Rust TPC-DS implementation against two reference implementations:
- Java / Trino (default,
--compat trino) — the Java port ofdsdgenused by Trino. The Rust port was originally derived from this and is expected to be byte-for-byte identical. - C
dsdgen(--compat c) — the original TPC-supplied reference implementation. The--compat cmode corrects bugs in the Java port to match the C reference (see BUGS.md and the parent README).
Both conformance suites validate byte-for-byte identical output via
MD5/diff comparison.
tpcdsgen/
├── tests/
│ └── fixtures/
│ ├── scale-1-trino/ # Java reference (`--compat trino`)
│ │ ├── MD5SUMS # checked into git, used for output comparisons
│ │ ├── call_center.dat # *.dat files are gitignored; generated by
│ │ ├── warehouse.dat # generate-fixtures.sh (only needed for --full)
│ │ └── ... (all 25 tables)
│ └── scale-1-c/ # C dsdgen reference (`--compat c`)
│ ├── MD5SUMS
│ ├── call_center.dat
│ ├── warehouse.dat
│ └── ... (all 25 tables)
└── scripts/
├── bootstrap-trino.sh # Clone + build the Java TPC-DS impl
├── generate-fixtures.sh # Generate/download reference fixtures
│ # (Java via --compat trino; C via --compat c)
├── compare-table.sh # Compare one table
├── test-all-tables.sh # Compare all ported tables
├── clean-fixtures.sh # Clean fixtures
└── README.md # This file
# Default (MD5-only):
./scripts/test-all-tables.shIf that fails, re-run byte-for-byte against the .dat fixtures — this
needs a one-time Java bootstrap and fixture generation:
./scripts/bootstrap-trino.sh # first time only
./scripts/generate-fixtures.sh # produces .dat files
./scripts/test-all-tables.sh --fullThe C reference data is pre-generated and published in
alamb/tpcds-data, one branch per
scale factor (sf1, sf2, ...). For the default MD5-only path the
checked-in MD5SUMS is enough; only --full needs the actual data,
which generate-fixtures.sh --compat c clones with --depth 1 and
extracts into tests/fixtures/scale-N-c/.
# Default (MD5-only): no download needed.
./scripts/test-all-tables.sh --compat c
# Byte-for-byte (--full): download the C reference data first.
./scripts/generate-fixtures.sh --compat c # sf1
./scripts/generate-fixtures.sh --compat c --scale 2 # sf2
./scripts/test-all-tables.sh --compat c --full
# Or compare a single table.
./scripts/compare-table.sh reason --compat c # MD5-only
./scripts/compare-table.sh reason --compat c --full # byte-for-byteEach script is self-documenting — open it and read the header comment for full usage, flags, environment variables, output, and exit codes. The table below is just a roadmap.
| Script | Purpose |
|---|---|
bootstrap-trino.sh |
Clone and build the Java / Trino reference implementation into ../tpcds/. Run once before Java conformance. |
generate-fixtures.sh |
Populate tests/fixtures/scale-N-{trino,c}/ with reference data. --compat trino (default) runs the Java impl; --compat c downloads pre-generated C dsdgen data from alamb/tpcds-data. |
compare-table.sh |
Compare one table's Rust output against the selected reference. Default: MD5-only against MD5SUMS. --full: byte-for-byte against the .dat fixture (MD5 + diff). |
test-all-tables.sh |
Run the full conformance suite for one compat mode (the main CI entry point). Default: MD5-only. --full: byte-for-byte. Honors per-mode skip lists at the top of the script. |
clean-fixtures.sh |
Remove all generated fixtures under tests/fixtures/. |
Run any script with --help to print its usage block.
./scripts/compare-table.sh <table> # one table, vs. Trino
./scripts/test-all-tables.sh # all tables, vs. Trino
./scripts/test-all-tables.sh --compat c # all tables, vs. C dsdgenNo reference data download needed — the comparison reads
MD5SUMS straight from the repo.
Use when an MD5 mismatch needs a row-level diff.
# Java reference: generate fixtures, then compare.
./scripts/generate-fixtures.sh # one-time
./scripts/test-all-tables.sh --full
# C dsdgen reference: download fixtures, then compare.
./scripts/generate-fixtures.sh --compat c # one-time
./scripts/test-all-tables.sh --compat c --full./scripts/clean-fixtures.sh --yes # remove all fixtures- MD5-only (default): just a Cargo-built
tpcdsgenbinary attarget/debug/tpcdsgenortarget/release/tpcdsgen. No Java, no C reference data, no fixture download. --full, Java: Maven-built TPC-DS JAR at../tpcds/target/tpcds-*-jar-with-dependencies.jar(bootstrap-trino.shhandles this).--full, C dsdgen reference:git,tar,bzip2forgenerate-fixtures.sh --compat c. No C compiler required — data is pre-generated.- Disk space (
--full): ~1 GB for SF1 Java fixtures; ~2.4 GB for SF1 C fixtures.
Problem: Java JAR not found
cd ../tpcds
mvn clean packageProblem: Rust binary not found
cargo build --releaseProblem: Fixture not found (Java path)
./scripts/generate-fixtures.sh XProblem: Fixture not found (C path)
./scripts/generate-fixtures.sh --compat c --scale NProblem: Tables don't match
- Check that the right compat mode is selected (
--compat trinovs--compat c). - Re-run with
--fullto get a row-level diff (downloads the reference fixtures). - Verify both sides use the same seed (the Rust generator is deterministic).
- Use the
diffoutput to find the first difference and debug the specific row/column.
These scripts are designed to be CI-friendly. The default (MD5-only) path skips the slow reference-data step entirely:
# Java conformance (MD5-only)
- run: ./scripts/test-all-tables.sh --quiet
# C dsdgen conformance (MD5-only)
- run: ./scripts/test-all-tables.sh --compat c --quietIf a job needs a row-level diff on failure, add --full (and the matching
fixture step):
# Java conformance (--full)
- run: ./scripts/bootstrap-trino.sh
- run: ./scripts/generate-fixtures.sh --quiet
- run: ./scripts/test-all-tables.sh --full --quiet
# C dsdgen conformance (--full)
- run: ./scripts/generate-fixtures.sh --compat c
- run: ./scripts/test-all-tables.sh --compat c --full --quietExit codes make it easy to fail CI on mismatches.
- Fixtures are gitignored - They're generated artifacts, not source code
- Deterministic output - Same seed always produces same data
- Byte-for-byte equality - Not just row count, complete binary match
- Bug compatibility - Maintains same quirks as Java/C versions (e.g., leap year bug)