-
Notifications
You must be signed in to change notification settings - Fork 166
Description
Is your feature request related to a problem? Please describe.
For my thesis, I want to collect some sets of data (think of huge lists of matrices, where each entry is a FreeAsssociativeAlgebraElem). In many of the observed cases, the resulting mrdi file exceeds sizes of multiple GB. However, when running gzip (with default settings), the size reduces dramatically. In one particular example that I want to mention here, the file size goes down from 3.2G to 65M, which is a factor of ~50. Running gzip in this case needs about 20sec, which is negligible compared to the time required to produce the data and moving the data around.
Furthermore, I regularly fill my disk quota on our compute servers with such uncompressed files.
Describe the solution you'd like
Some way to let Oscar.Serialization produce and read gzipped files, without having to manually handle uncompressed files.
Describe alternatives you've considered
- Leave it to the user to attach a
CodecZlib.GzipCompressorStreamto the opened file, and callsavewith the resulting io object. - In addition to
saveandloadhave functionssave_compressedandload_compressedthat behave basically identically, but include theCodecZlib.GzipCompressorStreamin-between layer when opening files. - Add a
GzipSerializerthat gets created as e.g.GzipSerializer(JSONSerializer())and when called in(de)serializer_openwraps theioobject in anCodecZlib.GzipCompressorStream.
Orthogonal to the above options, one could leave it to the deserializer to detect if a given file is compressed (either by file name ending or by the magic bytes 1f 8b) and in this case automatically decompress it.
I am happy to implement this myself, but I wanted to collect some opinions on the different options before starting further work.
Pinging people that might have an opinion (@antonydellavecchia @benlorenz @fingolfin), but everybody else please also comment