Skip to content

Inconsistent handling of special characters in DecodedURL #164

Open
@deains

Description

Working with DecodedURL objects is problematic if the URL contains special characters. Because attributes like url.path do not return the raw value, you get some weird behaviour using replace:

>>> from hyperlink import parse
>>> url = parse('https://www.example.com/game/Saints-Row:-The-Third')
>>> url == url.replace()
True
>>> url == url.replace(host=url.host)
True
>>> url == url.replace(path=url.path)
False
>>> url.path == url.replace(path=url.path).path
True

In the docs there's this line about encoding:

DecodedURL automatically handles encoding and decoding all its components, such that all inputs and outputs are in a maximally-decoded state.

This does not seem to be the case as if DecodedURL was doing this, these equalities would always be True, unless I'm misunderstanding something (entirely possible 🙃).

Working with parse('…', decoded=False) instead seems to solve this problem, although I'm not sure if that's the "right" solution to this.

This situation was very confusing to me, especially since normalize() didn't seem to have any effect at all (although reading the docs for it, I'm not actually sure if it should or not).

I'm unsure if this behaviour is a bug, or it's just me not using the library the right way. Either way, I'd appreciate your input!

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions