Open
Description
Data generated by this script:
import random
rd = random.Random()
rd.seed(0)
HIGH_ENTROPY = bytes(rd.randint(0, 200) for _ in range(10_000_000)) * 10
with open("med.bin", "wb") as f:
f.write(HIGH_ENTROPY)
gzip -1: 100000000 -> 96526120
zstd -1: 100000000 -> 100002299
If I remove these heusistics:
zstd/lib/compress/huf_compress.c
Line 1297 in b7b7edb
zstd/lib/compress/huf_compress.c
Line 1303 in b7b7edb
We get:
zstd -1: 100000000 -> 96449637
Zstd should do a better job with determining compressibility so we don't lose out on this case.