Skip to content

Simplify route declaration in url2purl #53

Open
@tdruez

Description

@tdruez

Following https://github.com/package-url/packageurl-python/pull/51/files/c1d41a8930b0b89dfc3774b4e18d89de5089e593..7877bb50102482468bdb9b32476d5a6151dc368e#r508692262

Working with regex syntax is always hard but should not be necessary for most of the simple routes.
For example, a common pattern '[^/]+' in path segment should be abstracted for better readability and new route addition.

We could re-use some ideas from the recent Django's URL route system that now replaces the old regex system: https://docs.djangoproject.com/en/3.1/topics/http/urls/#url-dispatcher

This system abstracts the regex complexity into "converters", for example r'^articles/(?P<year>[0-9]{4})/$' becomes articles/<yyyy:year>/

Using a current url2purl example:

  • pattern = r"https?://raw.githubusercontent.com/(?P<namespace>[^/]+)/(?P<name>[^/]+)/(?P<version>[^/]+)/(?P<subpath>.*)$"

Could become:

  • route = "https://raw.githubusercontent.com/<str:namespace>/<str:name>/<str:version>/<path:subpath>"

Much easier to write and to read.


Playing around with the Django's _route_to_regex

from django.urls.resolvers import _route_to_regex

route = "https://raw.githubusercontent.com/<str:namespace>/<str:name>/<str:version>/<path:subpath>"
pattern = _route_to_regex(route, is_endpoint=True)[0]
# -> "^https\\:\\/\\/raw\\.githubusercontent\\.com\\/(?P<namespace>[^/]+)\\/(?P<name>[^/]+)\\/(?P<version>[^/]+)\\/(?P<subpath>.+)$"

url = "https://raw.githubusercontent.com/LeZuse/flex-sdk/master/frameworks/projects/mx/src/mx/containers/accordionClasses/AccordionHeader.as"
re.compile(pattern, re.VERBOSE).match(url).groupdict()
# -> {'namespace': 'LeZuse', 'name': 'flex-sdk', 'version': 'master', 'subpath': 'frameworks/projects/mx/src/mx/containers/accordionClasses/AccordionHeader.as'}

We could add custom converter for the specific needs of purl https://docs.djangoproject.com/en/3.1/topics/http/urls/#registering-custom-path-converters
Some parts like the (http|https) will need support as well as the domain section is not part of the Django system:

from django.urls.resolvers import _route_to_regex
from django.urls.converters import register_converter
from django.urls.converters import StringConverter

class ProtocolConverter(StringConverter):
    regex = '(http|https|ftp)'

register_converter(ProtocolConverter, 'protocol')

route = "<protocol:protocol>://raw.githubusercontent.com/<str:namespace>/<str:name>/<str:version>/<path:subpath>"
_route_to_regex(route, is_endpoint=True)[0]

'^(?P<protocol>(http|https|ftp))\\:\\/\\/raw\\.githubusercontent\\.com\\/(?P<namespace>[^/]+)\\/(?P<name>[^/]+)\\/(?P<version>[^/]+)\\/(?P<subpath>.+)$'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions