Skip to content

Multi-threaded compression? #5341

Open
Open
@khdlr

Description

What I need help with / What I was wondering
I need to build a large dataset of imagery that has > 3 channels (multi-spectral satellite imagery), so I'm relying on the tfds.features.Tensor feature connector. As writing data uncompressed is highly inefficient, I'm using tfds.features.Encoding.ZLIB for compression.

However, this compression step actually becomes the bottleneck in my dataset building process as it is single-threaded, causing my dataset build to take longer than a month.

What I've tried so far
Read up on the docs, also checked the tf.io namespace for any possible workarounds.

It would be nice if...

  • Is there any way of speeding up the encoding/compression of the examples by using multiple cores?
  • Are there plans to support a faster compression method than ZLIB for generic Tensor features?

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions