Environment
Delta-rs version: git latest
Binding: Rust
Environment:
- Cloud provider: AWS
- OS: Linux
- Other: Writing Parquet 2.0 files with ZSTD
Bug
What happened:
When compacting a table, delta-rs seems to consistently undershoot the target file size, leading to it creating a file size of ~98MB and another file of roughly 1MB.
What you expected to happen:
It should write one file with 100MB, rather than splitting into a 98MB file and a ~1.8MB file.
How to reproduce it:
Working on a repro right now.
More details:
My guess is that this occurs only with Parquet V2 since usage of it in the wild is rare, and the naming indicates this codepath is getting hit https://github.com/delta-io/delta-rs/blob/main/crates/core/src/operations/write/writer.rs#L478-L486. It's entirely possible this is an upstream parquet-rs issue.
