Skip to content

On redirects, middle URL with ø char gets parsed wrongly - leading to a 404 #10047

@Alekky09

Description

@Alekky09

Describe the bug

Hello,

If I try to fetch this URL using aiohttp https://cornelius-k.dk/synsproeve/, it will redirect, eventually leading to a 404 when trying to get https://cornelius-k.dk/synspr\udcf8ve at the end of the chain.

Looks like the Location header will be parsed wrongly from b'https://cornelius-k.dk/synspr\xf8ve' which I found in the Response._raw_headers.

To Reproduce

Code block:

import aiohttp
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'
}
async def fetch_url(url):
    async with aiohttp.ClientSession(headers=headers) as session:
        async with session.get(url) as response:
            for i in response.history:
                print(i.url)
                print(i._headers)
                print(i._raw_headers)
            return response.status
print(await fetch_url("https://cornelius-k.dk/synsproeve/"))

Final URL in the redirect chain will be https://cornelius-k.dk/synspr�ve instead of https://cornelius-k.dk/synsprøve and 404 will be yielded.

Expected behavior

Parsing URL in the redirects correctly and fetching the correct final URL.

Logs/tracebacks

Output of the code block:

https://cornelius-k.dk/synsproeve/
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Tue, 26 Nov 2024 16:02:17 GMT', 'Content-Type': 'text/html', 'Content-Length': '162', 'd-cache': 'from-cache', 'Cache-Control': 'no-cache, no-store, must-revalidate', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'x-content-type-options': 'nosniff', 'strict-transport-security': 'max-age=31536000; preload', 'x-frame-options': 'SAMEORIGIN', 'content-security-policy': "frame-ancestors 'self'", 'Location': 'https://cornelius-k.dk/synsproeve', 'd-geo': 'US')>
((b'server', b'nginx'), (b'date', b'Tue, 26 Nov 2024 16:02:17 GMT'), (b'content-type', b'text/html'), (b'content-length', b'162'), (b'd-cache', b'from-cache'), (b'cache-control', b'no-cache, no-store, must-revalidate'), (b'expires', b'Thu, 01 Jan 1970 00:00:00 GMT'), (b'x-content-type-options', b'nosniff'), (b'strict-transport-security', b'max-age=31536000; preload'), (b'x-frame-options', b'SAMEORIGIN'), (b'content-security-policy', b"frame-ancestors 'self'"), (b'location', b'https://cornelius-k.dk/synsproeve'), (b'd-geo', b'US'))
https://cornelius-k.dk/synsproeve
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Tue, 26 Nov 2024 16:02:18 GMT', 'Content-Type': 'text/html', 'Content-Length': '162', 'Location': 'http://cornelius-k.dk/synspr%C3%B8ve', 'd-cache': 'from-cache', 'Cache-Control': 'no-cache, no-store, must-revalidate', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'x-content-type-options': 'nosniff', 'strict-transport-security': 'max-age=31536000; preload', 'x-frame-options': 'SAMEORIGIN', 'content-security-policy': "frame-ancestors 'self'", 'd-geo': 'US')>
((b'server', b'nginx'), (b'date', b'Tue, 26 Nov 2024 16:02:18 GMT'), (b'content-type', b'text/html'), (b'content-length', b'162'), (b'location', b'http://cornelius-k.dk/synspr%C3%B8ve'), (b'd-cache', b'from-cache'), (b'cache-control', b'no-cache, no-store, must-revalidate'), (b'expires', b'Thu, 01 Jan 1970 00:00:00 GMT'), (b'x-content-type-options', b'nosniff'), (b'strict-transport-security', b'max-age=31536000; preload'), (b'x-frame-options', b'SAMEORIGIN'), (b'content-security-policy', b"frame-ancestors 'self'"), (b'd-geo', b'US'))
http://cornelius-k.dk/synspr%C3%B8ve
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Tue, 26 Nov 2024 16:02:18 GMT', 'Content-Length': '0', 'Connection': 'keep-alive', 'Cache-Control': 'no-cache, no-store, must-revalidate', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Security-Policy': "frame-ancestors 'self'", 'Location': 'https://cornelius-k.dk/synspr\udcf8ve', 'D-Geo': 'US')>
((b'Server', b'nginx'), (b'Date', b'Tue, 26 Nov 2024 16:02:18 GMT'), (b'Content-Length', b'0'), (b'Connection', b'keep-alive'), (b'Cache-Control', b'no-cache, no-store, must-revalidate'), (b'Expires', b'Thu, 01 Jan 1970 00:00:00 GMT'), (b'X-Content-Type-Options', b'nosniff'), (b'X-Frame-Options', b'SAMEORIGIN'), (b'Content-Security-Policy', b"frame-ancestors 'self'"), (b'Location', b'https://cornelius-k.dk/synspr\xf8ve'), (b'D-Geo', b'US'))
(404, URL('https://cornelius-k.dk/synspr�ve'))

Python Version

3.9.20

aiohttp Version

3.11.7

multidict Version

6.1.0

propcache Version

0.2.0

yarl Version

1.17.1

OS

macOS

Related component

Client

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions