Inconsistent handling of special characters in DecodedURL #164
Description
Working with DecodedURL objects is problematic if the URL contains special characters. Because attributes like url.path
do not return the raw value, you get some weird behaviour using replace
:
>>> from hyperlink import parse
>>> url = parse('https://www.example.com/game/Saints-Row:-The-Third')
>>> url == url.replace()
True
>>> url == url.replace(host=url.host)
True
>>> url == url.replace(path=url.path)
False
>>> url.path == url.replace(path=url.path).path
True
In the docs there's this line about encoding:
DecodedURL automatically handles encoding and decoding all its components, such that all inputs and outputs are in a maximally-decoded state.
This does not seem to be the case as if DecodedURL was doing this, these equalities would always be True, unless I'm misunderstanding something (entirely possible 🙃).
Working with parse('…', decoded=False)
instead seems to solve this problem, although I'm not sure if that's the "right" solution to this.
This situation was very confusing to me, especially since normalize()
didn't seem to have any effect at all (although reading the docs for it, I'm not actually sure if it should or not).
I'm unsure if this behaviour is a bug, or it's just me not using the library the right way. Either way, I'd appreciate your input!