Return the estimated effectiveness of dictionaries in the trainer

Hi,

I am updating some code using zstd from 1.3.4 to 1.3.8, and encountered one issue in when updating client code which seems related to the changes in dictionary training.

The use case is training a dictionary on some payload type in a server, and send the dictionary to a client, and then use the dictionary to encode messages of this type from server to client. In one degenerate test case, it looks like some data that could be used to "successfully" train a dictionary in 1.3.4 is now yielding an "Error (generic)" in 1.3.8. I suspect that the training data is too small (most messages are a few bytes). I tried looking for guidance on how to train dictionaries or what restrictitions apply, and found an upper bound in https://github.com/facebook/zstd/issues/1288#issuecomment-424547876  but not much around lower bounds. 

(A reproduction example I am looking at is is training a dictionary in consecutive integers from 0 to 10_000_000 represented as strings, which returns succesfully in 1.3.4, but not succesfully in 1.3.8.)

I am wondering if it could make sense to have some form of dummy dictionary exposed in the library to cater for cases where a dictionary cannot successfully be trained in a programmatic way, which would revert the behavior of e.g. `ZSTD_compress_usingCDict` to `ZSTD_compressCCtx`. In the example described above, I would like to gracefully upgrade the server/client communication to not use a dictionary if the server does not manage to build said dictionary. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return the estimated effectiveness of dictionaries in the trainer #1506

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Return the estimated effectiveness of dictionaries in the trainer #1506

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions