This crate provides the core data generator logic for TPC-H.
# Build the generator
cargo build --release
# Generate all tables at scale factor 1 (default)
./target/release/tpcdsgen
# Generate all tables at scale factor 10
./target/release/tpcdsgen --scale 10
# Generate specific table
./target/release/tpcdsgen --table store_sales --scale 10
# Generate to a specific directory
./target/release/tpcdsgen --scale 10 --directory /path/to/outputFixtures are pre-generated TPC-DS data files used for conformance testing.
tests/fixtures/
├── scale-1-trino/ # Java reference fixtures (`--compat trino`)
├── scale-1-c/ # C dsdgen reference fixtures (`--compat c`)
└── scale-10-trino/ # higher scale factors as needed
tpcdsgen ships with two conformance suites, both implemented as shell
scripts that do byte-for-byte (MD5) comparison of .dat output. See
scripts/README.md for full details.
vs. Java / Trino reference (default, --compat trino):
# One-time: clone & build the Java TPC-DS implementation.
./scripts/bootstrap-trino.sh
# Generate Java reference fixtures into tests/fixtures/scale-N-trino/.
./scripts/generate-fixtures.sh
# Compare Rust output byte-for-byte against the Java fixtures.
./scripts/test-all-tables.sh --scale 1vs. C dsdgen reference (--compat c):
# One-time: download pre-generated C dsdgen data from
# https://github.com/alamb/tpcds-data into tests/fixtures/scale-N-c/.
./scripts/generate-fixtures.sh --compat c --scale 1
# Compare Rust --compat c output byte-for-byte against the C fixtures.
./scripts/test-all-tables.sh --compat c --scale 1Both suites also support comparing a single table:
./scripts/compare-table.sh reason # vs. Java
./scripts/compare-table.sh reason --compat c # vs. C dsdgenEach fixture directory contains an MD5SUMS file for verification.
On Linux:
cd tests/fixtures/scale-1-trino
md5sum -c MD5SUMSOn macOS:
cd tests/fixtures/scale-1-trino
while read hash file; do
[[ $(md5 -q "$file") == "$hash" ]] && echo "$file: OK" || echo "$file: FAILED"
done < MD5SUMSThe TPC-DS reference implementation contains several bugs that must be replicated for benchmark compliance. These bugs originated in the C implementation and were faithfully reproduced in the Java port. Our Rust implementation also replicates these bugs to ensure byte-for-byte compatibility with the reference implementation.
See BUGS.md for a detailed list of documented bugs, more will be added.
These are the canonical MD5 hashes for TPC-DS data generated by the Java reference implementation. The Rust implementation must produce byte-for-byte identical output.
Generated with: java -jar tpcds-*.jar --scale 1
| Table | MD5 Hash |
|---|---|
| call_center.dat | cc9aabc63eb8603bd7330b6735ed0961 |
| catalog_page.dat | 0bbac1b8bdcf8ce2d5f0034980ee0196 |
| catalog_returns.dat | 8460b5abd6b6ceaf6107f217b016fb23 |
| catalog_sales.dat | 51a0bc401b4b64d94736634b54068240 |
| customer.dat | 3672ffdefac3cf00413ecef71a753636 |
| customer_address.dat | abac2e3925ab9bf66cec3b527a0468ed |
| customer_demographics.dat | 8831872c6d56ea9d4f24701f2feaef48 |
| date_dim.dat | f3e77714328dcc57302777e72fd7747c |
| dbgen_version.dat | a430da74c2e44926c53deb74e35b23f1 * |
| household_demographics.dat | dccf2ff17c5e420021fbf92bf9a0a5ec |
| income_band.dat | db8e8012be51ef81cf215774bec95533 |
| inventory.dat | cfefc8724693ec9149f1d5b345fcecc2 |
| item.dat | bebbcfd1acecdea16a5a3feb5e4deb96 |
| promotion.dat | acb42558d0dc5e0ab6df5a664c1629cf |
| reason.dat | 57fe9b8688095bd345cc846ec4400be0 |
| ship_mode.dat | 791d16af982a67ad170a6b6527e25a35 |
| store.dat | 80082d03e1b01340e19db3187d8edbd6 |
| store_returns.dat | 9009d804c02ee839e0b2ecd5fb4ae03f |
| store_sales.dat | f003b3810e042d6dd47f48506616d88d |
| time_dim.dat | a68339c5720d25380b53f6e0f2f72333 |
| warehouse.dat | f56789e8b724b989d74e213e0686052f |
| web_page.dat | 6feef91675c336d6f25e55ebbdf8c13c |
| web_returns.dat | e45390d32d1698fef71f05f474a4d748 |
| web_sales.dat | 15f9d835727f3a39a096c346f56e51f7 |
| web_site.dat | de5fb00a80673cb44b4b508da75d4bcf |
Generated with: java -jar tpcds-*.jar --scale 10
| Table | MD5 Hash |
|---|---|
| call_center.dat | 235909679f4d125e769aa38eb16e9098 |
| catalog_page.dat | a5daa0d93ecde8bd9f6ed79cd3b63916 |
| catalog_returns.dat | 982a8b96fa0d9487015cd137136c8f68 |
| catalog_sales.dat | 97d5351b430d6c15e3906518315f0787 |
| customer.dat | 486a030a55d468ef15ff2ff01583e6dc |
| customer_address.dat | 860602fea368111009ef08b167e1e299 |
| customer_demographics.dat | 8831872c6d56ea9d4f24701f2feaef48 |
| date_dim.dat | f3e77714328dcc57302777e72fd7747c |
| dbgen_version.dat | 8553e926c33f4ad84e4d58fcfd20c48c * |
| household_demographics.dat | dccf2ff17c5e420021fbf92bf9a0a5ec |
| income_band.dat | db8e8012be51ef81cf215774bec95533 |
| inventory.dat | 4ad3640917c6567038f081bbe2cf0e3e |
| item.dat | bff29691c74ae66eb2dcc3af686fb2ba |
| promotion.dat | b8e8a7741f64edc5d09fdb0453c86705 |
| reason.dat | a1fdcd35ca0eddd0d5f37b0e5c2fddb3 |
| ship_mode.dat | 791d16af982a67ad170a6b6527e25a35 |
| store.dat | 430a01467a2d55d0e9a1bebad4f1c44b |
| store_returns.dat | 4ba001a6066db20066cd198242f92ca1 |
| store_sales.dat | ecff92350fa0466e9b9407a1b5ad4020 |
| time_dim.dat | a68339c5720d25380b53f6e0f2f72333 |
| warehouse.dat | e0c56fe622774d09c9dec42029881ad5 |
| web_page.dat | e55695fdb2b86f96cf46e2a55b6f3748 |
| web_returns.dat | ac0197593d3f4cc3bb46c8ad7e6cd735 |
| web_sales.dat | 4da375300bcb0ce8785e1f100fb72efe |
| web_site.dat | 4669d52e36cd112af10e137e5d8d7697 |
* dbgen_version.dat contains timestamps and will differ between runs.
To verify the Rust implementation matches:
# Verify at scale 1
./scripts/test-all-tables.sh --scale 1
# Verify at scale 10
./scripts/test-all-tables.sh --scale 10