Skip to content

[s3] ContentEncoding is disregarded #743

@goranvinterhalter

Description

@goranvinterhalter

Problem description

This I believe is the same issue as #422 but it's for S3.

Certain libraries, like django_s3_storage use ContentEncoding https://github.com/etianen/django-s3-storage/blob/master/django_s3_storage/storage.py#L330 to express on-the-fly compression/decompression.

Smart open does not support this and I have to manually check for the presence of ContentEncoding when reading such files. The s3 documentation specifies:

ContentEncoding (string) -- Specifies what content encodings have been applied to the object and thus what decoding mechanisms must be applied to obtain the media-type referenced by the Content-Type header field.

Is this something that can/will be implemented at some point?

Steps/code to reproduce the problem

It's hard to give precise steps, but simply put uploading a .txt file with .txt extension who's content has been gziped and ContentEncoding value is "gzip" should be automatically decompressed, but it is not.

Versions

Linux-4.14.296-222.539.amzn2.x86_64-x86_64-with-glibc2.2.5
Python 3.7.10 (default, Jun  3 2021, 00:02:01)
[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)]
smart_open 6.2.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedWe can't figure this out, if you can, then please help!

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions