Skip to content

url_authority result shape is inconsistent across URL grammars #382

Description

@fhightower

In ioc_finder/ioc_grammars.py, four grammars expose a url_authority named result but with two different shapes:

grammar parse_string(...).url_authority type
url str
scheme_less_url str
url_complete ParseResults (with url_host inside)
scheme_less_url_complete ParseResults (with url_host inside)

Reproduced on main:

>>> from ioc_finder import ioc_grammars
>>> ioc_grammars.url.parse_string("https://example.com/path").url_authority
'example.com'                                  # str
>>> ioc_grammars.url_complete.parse_string("https://example.com/path").url_authority
ParseResults(['example.com'], {'url_host': ['example.com']})   # ParseResults

Consumers have to special-case the shape, e.g. on the #369 branch tests/test_urls.py::test_parse_url_helper_handles_scheme_ful_and_scheme_less does:

authority = parsed.url_authority
if not isinstance(authority, str):
    authority = authority[0]

That branch is doing the consumer's job to paper over the grammar API. Pre-existing — not introduced by #369 — but #369's new _parse_url 4-grammar cascade broadens the surface that the asymmetry leaks through, which makes the inconsistency more visible to downstream callers.

Suggested fix

Normalize all four grammars to return url_authority as a ParseResults (with url_host and, where applicable, url_userinfo inside) — or as a str everywhere, whichever matches the design intent. Whichever direction is chosen, the named result should mean the same thing across grammars so callers can rely on a single shape.

Refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions