You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
perf(tpcdsgen): make conformance suite MD5-only by default, add --full
The conformance scripts always required the full reference `.dat`
fixtures, which means every local run and every CI job had to either
build the Java implementation and generate ~1 GB of Trino data, or
download ~2.4 GB of pre-generated C `dsdgen` data, just to confirm
something CI already knew the hash of. For the common "everything
matches" case this is pure overhead.
The expected MD5 hashes already live in
`tests/fixtures/scale-N-{trino,c}/MD5SUMS`, which is checked into the
repo. Switch the scripts to use those by default:
- `compare-table.sh` and `test-all-tables.sh` now run in MD5-only mode
by default: generate the Rust table to a tempfile, MD5 it, and look
the expected hash up in `MD5SUMS`. No `.dat` fixture needed.
- On mismatch the scripts print both hashes and suggest re-running with
`--full` to see a row-level diff.
- `--full` preserves the old byte-for-byte behavior: require the
fixture, compute MD5s, diff on mismatch. Use it to debug a failing
MD5 check.
Update the script READMEs and `tpcdsgen/README.md` to document the new
default and the `--full` escape hatch.
CI is unchanged in this commit — it still calls `generate-fixtures.sh`
and `test-all-tables.sh` without `--full`, which is harmless (the
download is now wasted, but tests still pass). A follow-up can drop
the fixture step from CI to actually realize the speedup.
Refs #265.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments