Skip to content

[FR][Packaging] Reference implementation in Python for the glob pattern expansion specified as attachment to PEP 639 #4299

Open
@abravalheri

Description

@abravalheri

Reference: https://discuss.python.org/t/pep-639-round-3-improving-license-clarity-with-better-package-metadata/53020/174

Could we please have documented somewhere a reference implementation in Python for the glob part that complies with the mandatory requirements of the PEP? (maybe an attachment? Or something in the PyPA docs?)

The original intention of "let's document whatever stdlib's glob do, so that we can implement it in other languages" was generally agreed in the Discourse thread. However, there was a significant departure from this original intention to something that require a lot more validations which are not implemented by Python's stdlib itself.

Setuptools received something similar to the following in a contribution to setuptools: Validate license-files glob patterns by cdce8p · Pull Request #4841 · pypa/setuptools · GitHub (thanks @cdce8p)

import os
import re
from glob import glob


def find_pattern(pattern: str) -> list[str]:
    """
    >>> find_pattern("/LICENSE.MIT")
    Traceback (most recent call last):
    ...
    ValueError: Pattern '/LICENSE.MIT' should be relative...
    >>> find_pattern("../LICENSE.MIT")
    Traceback (most recent call last):
    ...
    ValueError: Pattern '../LICENSE.MIT' cannot contain '..'...
    >>> find_pattern("LICEN{CSE*")
    Traceback (most recent call last):
    ...
    ValueError: Pattern 'LICEN{CSE*' contains invalid characters...
    """
    if ".." in pattern:
        raise ValueError(f"Pattern {pattern!r} cannot contain '..'")
    if pattern.startswith((os.sep, "/")) or ":\\" in pattern:
        raise ValueError(
            f"Pattern {pattern!r} should be relative and must not start with '/'"
        )
    if re.match(r'^[\w\-\.\/\*\?\[\]]+$', pattern) is None:
        raise ValueError(
            f"Pattern '{pattern}' contains invalid characters. "
            "https://packaging.python.org/en/latest/specifications/pyproject-toml/#license-files"
        )
    found = glob(pattern, recursive=True)
    if not found:
        raise ValueError(f"Pattern '{pattern}' did not match any files.")
    return found

Is it enough/complete/correct? (at first glance I would say yes by looking at the text of the PEP, but I would like a second opinion).

/cc @befeleme

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions