Description
Is your feature request related to a problem? Please describe.
I have a tar archive containing periodic full backups of the same database. Each backup is very similar to the previous one, so each additional backup compressed in the same tar archive is orders of magnitude smaller, after compression, than the same file compressed standalone. Unfortunately, today appending each new backup (so that compression can leverage the redundancies between files) basically entails decompressing the full tar file, appending the new backup to it, and then compressing the whole tar file again - because no compressor I am aware of implements a way to persist (or even just reconstruct) the compression state of an existing stream.
Describe the solution you'd like
I would like a way to append data to an existing zstd stream making use of state of the whole stream, so that the new data can be compressed efficiently exploiting redundancies with the data already present in the compressed stream.
The persistent compression state could very well be larger than the compressed stream: this is acceptable.
The persistent compression state does not need to be publicly documented, nor stable across versions or platforms. If the persisted compression state is invalid/corrupt, it should be ignored.
This could take the form of a --state STATE_FILE
switch that could be used as follows:
# persist the compression state in STATE_FILE
zstd --state STATE_FILE -o OUTPUT_FILE INPUT_FILE
# append data to OUTPUT_FILE using the state from STATE_FILE, persist the final compression state in STATE_FILE
zstd --state STATE_FILE --append -o OUTPUT_FILE INPUT_FILE
Ideally, it should also be possible to reconstruct (and persist) the compression state of an existing stream.
If a stream consists of multiple independent sections (e.g. because the stream is rsyncable, or because a section was appended without making use of the persistent compression state) the persistent state would only be the one covering the section since the last state reset.
Describe alternatives you've considered
There are alternatives in the specific scenario I described above (e.g. do incremental backups, use a diff-like tool before compression, decompress+append+recompress, etc.) but they are not always practical or applicable in this or other scenarios.
Additional context