Bump Arrow and Parquet to 59.0.0#278
Conversation
|
YAAAAS! I am testing this out locally |
|
My numbers are about the same as yours (basically no difference) andrewlamb@Andrews-MacBook-Pro-3:~/Downloads$ hyperfine --runs 5 --prepare "rm -rf out" "./tpchgen-cli-58.3 parquet --scale-factor=10 --tables=lineitem --parts=10 --output-dir out" andrewlamb@Andrews-MacBook-Pro-3:~/Downloads$ hyperfine --runs 5 --prepare "rm -rf out" "./tpchgen-cli-60 parquet --scale-factor=10 --tables=lineitem --parts=10 --output-dir out" |
|
Thanks for the review @alamb! |
|
Thanks @kevinjqliu |
Summary
This bumps the Arrow and Parquet dependencies from 58 to 59.
The only test updates are the Parquet row-group byte-size snapshots. Those numbers come from Parquet's physical metadata, and Parquet 59 changed the writer's page batching for variable-width columns in apache/arrow-rs#9972. That shifts a few encoded byte totals for string/comment columns, but the generated data still round-trips correctly.
Validation
cargo check --workspace --all-targetscargo fmt --all -- --checkcargo test -p tpchgen-cli --test cli_integrationcargo test --workspacecargo clippy -p tpchgen-arrow -p tpchgen-cli --all-targets -- -D warningsgit diff --checkBenchmark
I also reran the lineitem Parquet benchmark shape from the previous Arrow bump with
hyperfine, using the currentparquetsubcommand and fresh output dirs. I ran it twice with the command order reversed, so the result is less sensitive to ordering/warmup noise. To reproduce locally, build the two release binaries and point these variables at them:The baseline binary resolved Arrow/Parquet to 58.3.0, and the upgraded binary resolved them to 59.0.0.
So I would read this as no material performance change on my loaded machine.
Combined across both orderings,
58.3.0averaged32.597s59.0.0averaged32.955sOutput size was effectively unchanged too:
27,146,187,982bytes for58.3.027,146,169,702bytes for59.0.0