Problem
url_decode / path_decode treat any % followed by two characters as a hex escape and decode it with hex_to_byte(). That has two failure modes:
-
Historical bug: a–z / A–Z were accepted as hex “digits”, so e.g. g was mapped to nibble 16 (g - 'a' + 10), which is not a valid hex nibble (must be 0–15). That produced wrong decoded bytes (e.g. %2g interpreted as a deliberate encoding instead of garbage).
-
After narrowing to a–f: Non-hex characters (e.g. z) fall through to c - '0', which is still not a hex nibble—it is an arbitrary value (e.g. 'z' - '0' = 74). So invalid input still decodes to wrong bytes, just differently.
Neither matches the URL Standard.
Expected behavior
WHATWG URL — Percent-encoded bytes: a percent-encoded byte is % followed by two ASCII hex digits. If that is not the case, append % only and continue (see the spec’s example: %25%s%1G → %%s%1G).
Suggested fix
- Only decode when the next two code points are ASCII hex digits.
- Otherwise emit a literal
% (and do not consume the following characters as hex).
- Trailing
% or %X with fewer than two following bytes should not fail the whole decode; treat like the spec (literal %).
Regression tests should lock the WHATWG example and cases like %2g → literal %2g (not a bogus byte).
Problem
url_decode/path_decodetreat any%followed by two characters as a hex escape and decode it withhex_to_byte(). That has two failure modes:Historical bug:
a–z/A–Zwere accepted as hex “digits”, so e.g.gwas mapped to nibble 16 (g - 'a' + 10), which is not a valid hex nibble (must be 0–15). That produced wrong decoded bytes (e.g.%2ginterpreted as a deliberate encoding instead of garbage).After narrowing to
a–f: Non-hex characters (e.g.z) fall through toc - '0', which is still not a hex nibble—it is an arbitrary value (e.g.'z' - '0' = 74). So invalid input still decodes to wrong bytes, just differently.Neither matches the URL Standard.
Expected behavior
WHATWG URL — Percent-encoded bytes: a percent-encoded byte is
%followed by two ASCII hex digits. If that is not the case, append%only and continue (see the spec’s example:%25%s%1G→%%s%1G).Suggested fix
%(and do not consume the following characters as hex).%or%Xwith fewer than two following bytes should not fail the whole decode; treat like the spec (literal%).Regression tests should lock the WHATWG example and cases like
%2g→ literal%2g(not a bogus byte).