Describe the bug
Training zli on a larger (180Mb) csv file with custom separator ultimately failed with the following error message
src/openzl/compress/encode_frameheader.c:617: Assertion `ZL_WC_size(&out) <= hsBound' failed where:
lhs = (unsigned long) 820
rhs = (unsigned long) 688
Abort trap: 6
To Reproduce
Steps to reproduce the behavior:
- ./zli train -p csv --profile-arg ^ -o test zlc --max-file-size-mb 2000 large_test_data.csv
Expected behavior
The training should happen without errors, as zli is able to compress large_test_data.csv fine without training first
Screenshots and charts
Full output:
Picked 1 samples out of 1 samples with total size 188877818
Benchmarking untrained compressor...
src/openzl/codecs/dispatch_string/encode_dispatch_string_binding.c:77: EI_dispatch_string: splitting 114293019 strings into 135 outputs
src/openzl/codecs/dispatch_string/encode_dispatch_string_binding.c:77: EI_dispatch_string: splitting 114293019 strings into 135 outputs
src/openzl/codecs/dispatch_string/encode_dispatch_string_binding.c:77: EI_dispatch_string: splitting 114293019 strings into 135 outputs
src/openzl/codecs/dispatch_string/encode_dispatch_string_binding.c:77: EI_dispatch_string: splitting 114293019 strings into 135 outputs
src/openzl/codecs/dispatch_string/encode_dispatch_string_binding.c:77: EI_dispatch_string: splitting 114293019 strings into 135 outputs
src/openzl/codecs/dispatch_string/encode_dispatch_string_binding.c:77: EI_dispatch_string: splitting 114293019 strings into 135 outputs
src/openzl/codecs/dispatch_string/encode_dispatch_string_binding.c:77: EI_dispatch_string: splitting 114293019 strings into 135 outputs
src/openzl/codecs/dispatch_string/encode_dispatch_string_binding.c:77: EI_dispatch_string: splitting 114293019 strings into 135 outputs
src/openzl/codecs/dispatch_string/encode_dispatch_string_binding.c:77: EI_dispatch_string: splitting 114293019 strings into 135 outputs
src/openzl/codecs/dispatch_string/encode_dispatch_string_binding.c:77: EI_dispatch_string: splitting 114293019 strings into 135 outputs
src/openzl/codecs/dispatch_string/encode_dispatch_string_binding.c:77: EI_dispatch_string: splitting 114293019 strings into 135 outputs
1 files: 188877818 -> 10849222 (17.41), 145.48 MB/s 370.85 MB/s
src/openzl/codecs/dispatch_string/encode_dispatch_string_binding.c:77: EI_dispatch_string: splitting 114293019 strings into 135 outputs
Selected greedy trainer by default since no trainer was specified
src/openzl/compress/encode_frameheader.c:617: Assertion `ZL_WC_size(&out) <= hsBound' failed where:
lhs = (unsigned long) 820
rhs = (unsigned long) 688
Abort trap: 6
Desktop (please complete the following information):
- OS: MacOS 15.7.4 (24G517) on a MacMini M4 with 16Gb RAM
- Compiler.
g++ --version responds
Apple clang version 17.0.0 (clang-1700.0.13.5)
Target: arm64-apple-darwin24.6.0
Thread model: posix
- Build system [e.g. Makefile]
Built from src on Sat 7th March @ 00:30 CET
git clone --depth 1 -b release https://github.com/facebook/openzl.git
cd openzl/
make -j 10 zli
Describe the bug
Training zli on a larger (180Mb) csv file with custom separator ultimately failed with the following error message
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The training should happen without errors, as zli is able to compress large_test_data.csv fine without training first
Screenshots and charts
Full output:
Desktop (please complete the following information):
g++ --versionrespondsApple clang version 17.0.0 (clang-1700.0.13.5)
Target: arm64-apple-darwin24.6.0
Thread model: posix
Built from src on Sat 7th March @ 00:30 CET