Skip to content

Data compression as an option #58

@sylvlecl

Description

@sylvlecl
  • Do you want to request a feature or report a bug?

Feature.

  • What is the current behavior?

Binary data stored in AFS are compressed and uncompressed automatically by several components.

The Cassandra based implementation :

  • automatically gzips chunks of data on write
  • automatically gunzips chunks of data on read

The remote implementation :

  • on write, automatically gzips data on the client side
  • on write, automatically gunzips data on the server side
  • on read, automatically gzips data on the server side
  • on read, automatically gunzips data on the client side

In case we want to read or write already compressed data, those steps are unnecessary and can hurt performance (and possibly memory usage).

  • What is the expected behavior?

If we could set up those components to not perform compression, it could improve performance (to be measured).

  • What is the motivation / use case for changing the behavior?

Performance optimization in a typical setup with a client connected to an AFS server, which itself relies on a Cassandra implementation of AFS.

In this kind of setup, when writing/reading data blobs, it is unnecessarily compressed and uncompressed on the server side.

Some benchmarking with JMH show that compressing a large XIIDM case (100 Mb) takes around 2s on my laptop CPU.
With the reception of around 50 cases per hour, it means 1-2 minutes of CPU time consumed for this every hour.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions