Skip to content

Support content encodings other than utf-8 #67

Open
@michaelwooley

Description

@michaelwooley

The content attribute is always encoded as "utf-8" (source).

However, we've noticed that https://google.com/ (as of 9/29) now returns content encoded as 'ISO-8859-1'.

This breaks some tests that do an actual http request then play back the result and compare the content attributes.

If you're curious, here is a demo of what happens in a (non-mocked) google.com result:

import requests

url = "https://google.com/"
res = requests.get(url)

print(res.headers['Content-Type']) # 'text/html; charset=ISO-8859-1'
print(res.encoding)  # 'ISO-8859-1'

try:
    res.content.decode("utf8")
except Exception as e:
    print(e)

content = res.content.decode(res.encoding)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions