Skip to content

[http] Adding custom Accept-Encoding header breaks read() #713

@grubberr

Description

@grubberr

Hello,

import smart_open

url = "https://fonts.googleapis.com/css?family=Montserrat"
headers = {"Accept-encoding": "deflate, gzip"}

result = smart_open.open(url, transport_params={"headers": headers}, mode="rb")
buff = result.read()
print(len(buff))

result = smart_open.open(url, transport_params={"headers": headers}, mode="rb")
buff = result.read(2)
buff += result.read()
print(len(buff))

196
209

196 bytes - gzip compressed result
209 bytes - uncompressed result

This happened because:
in 1-st case library uses self.response.raw.read() - it returns result as is from server, it's gzip compressed
in 2-nd case library uses self.response.iter_content - result uncompressed by requests library

Versions

print(platform.platform())
Linux-5.14.0-1047-oem-x86_64-with-glibc2.31
print("Python", sys.version)
Python 3.9.11 (main, Aug  9 2022, 09:22:28) 
[GCC 9.4.0]
print("smart_open", smart_open.__version__)
smart_open 6.0.0

Checklist

Before you create the issue, please make sure you have:

  • Described the problem clearly
  • Provided a minimal reproducible example, including any required data
  • Provided the version numbers of the relevant software

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions