Skip to content

Allow (de)serialization of gzip'ed files #5664

@lgoettgens

Description

@lgoettgens

Is your feature request related to a problem? Please describe.
For my thesis, I want to collect some sets of data (think of huge lists of matrices, where each entry is a FreeAsssociativeAlgebraElem). In many of the observed cases, the resulting mrdi file exceeds sizes of multiple GB. However, when running gzip (with default settings), the size reduces dramatically. In one particular example that I want to mention here, the file size goes down from 3.2G to 65M, which is a factor of ~50. Running gzip in this case needs about 20sec, which is negligible compared to the time required to produce the data and moving the data around.
Furthermore, I regularly fill my disk quota on our compute servers with such uncompressed files.

Describe the solution you'd like
Some way to let Oscar.Serialization produce and read gzipped files, without having to manually handle uncompressed files.

Describe alternatives you've considered

  1. Leave it to the user to attach a CodecZlib.GzipCompressorStream to the opened file, and call save with the resulting io object.
  2. In addition to save and load have functions save_compressed and load_compressed that behave basically identically, but include the CodecZlib.GzipCompressorStream in-between layer when opening files.
  3. Add a GzipSerializer that gets created as e.g. GzipSerializer(JSONSerializer()) and when called in (de)serializer_open wraps the io object in an CodecZlib.GzipCompressorStream.

Orthogonal to the above options, one could leave it to the deserializer to detect if a given file is compressed (either by file name ending or by the magic bytes 1f 8b) and in this case automatically decompress it.

I am happy to implement this myself, but I wanted to collect some opinions on the different options before starting further work.

Pinging people that might have an opinion (@antonydellavecchia @benlorenz @fingolfin), but everybody else please also comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions