Description
Hi,
I am updating some code using zstd from 1.3.4 to 1.3.8, and encountered one issue in when updating client code which seems related to the changes in dictionary training.
The use case is training a dictionary on some payload type in a server, and send the dictionary to a client, and then use the dictionary to encode messages of this type from server to client. In one degenerate test case, it looks like some data that could be used to "successfully" train a dictionary in 1.3.4 is now yielding an "Error (generic)" in 1.3.8. I suspect that the training data is too small (most messages are a few bytes). I tried looking for guidance on how to train dictionaries or what restrictitions apply, and found an upper bound in #1288 (comment) but not much around lower bounds.
(A reproduction example I am looking at is is training a dictionary in consecutive integers from 0 to 10_000_000 represented as strings, which returns succesfully in 1.3.4, but not succesfully in 1.3.8.)
I am wondering if it could make sense to have some form of dummy dictionary exposed in the library to cater for cases where a dictionary cannot successfully be trained in a programmatic way, which would revert the behavior of e.g. ZSTD_compress_usingCDict
to ZSTD_compressCCtx
. In the example described above, I would like to gracefully upgrade the server/client communication to not use a dictionary if the server does not manage to build said dictionary.