Skip to content

Non-numeric port parsing issue #180

Open
@kenballus

Description

@kenballus

The port number in the following URL is clearly malformed, but Hyperlink does this:

>>> hyperlink.URL.from_text("http://example.com: -໑_1\v").port
-11

This comes from the fact that ports are parsed with int. This leads to the following unintuitive consequences:

  • Whitespace, including all of (' ', '\t', '\v', '\r', '\n') (plus a bunch of unicode whitespace) will be stripped and from either side of the port number.
  • '-' or '+' can appear just before the first digit in the port number
  • '_' can appear between digits in the port number
  • Some unicode digits, such as '໑' can appear in port numbers
    All of this violates both the RFC and the WHATWG standard.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions