Skip to content

Make the boundary between public and private in the API clearer #76

@jameslamb

Description

@jameslamb

Details

The ongoing work on rapids-build-backend relies on using rapids-dependency-file-generator as an importable module, not a CLI, like this

from rapids_dependency_file_generator.cli import generate_matrix
from rapids_dependency_file_generator.constants import default_pyproject_dir
from rapids_dependency_file_generator.rapids_dependency_file_generator import (
    get_requested_output_types,
    make_dependency_files,
)

ref: https://github.com/rapidsai/rapids-build-backend/pull/17/files#r1542068869

Given that, I think some work should be done to make the boundary between public and private parts of the API here clearer.

Approach

  • use __all__ entries in every module of the library
    • to limit which things should be imported with * imports
  • use an __all__ entry in the top-level __init__.py
    • to define what is importable directly from the rapids_dependency_file_generator module without needing to know about submodules
    • to limit what is imported by from rapids_dependency_file_generator import *
  • prefix all internal-only functions, classes, methods, and other objects with _

Benefits of this work

Clarifies the boundary between public and private, reducing the need for tight coordination between rapids-build-backend and rapids-dependency-file-generator for some types of changes.

Makes it easier for tools to help enforce those boundaries:

Other notes

Consider the following.

only '__version__' is importable from the top-level namespace today (click me)
pip install .
python -c "from rapids_dependency_file_generator import *; print(dir())"
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '__version__']

Only __version__ is exported from the top-level module

That means that things like rapids-build-backend that want to import other things have imported coupled to the exact file layout in this project, like this.

from rapids_dependency_file_generator.rapids_dependency_file_generator import (
    get_requested_output_types,
    make_dependency_files,
)

In my opinion, it'd be better to re-export that stuff from the main module, so that rapids-build-backend could do this

from rapids_dependency_file_generator import (
    get_request_output_types,
    make_dependency_files
)

And so we could freely re-arrange the modules inside this library without breaking rapids-build-backend.

Exposing submodule import paths is really helpful in projects with large APIs like sklearn or scipy, but for this small library I think it'd be better to push downstream consumers to just import from the top-level namespace.

submodules are re-exporting all their imports (click me)

Consider the following

pip install .
python -c "from rapids_dependency_file_generator.cli import *; print(dir())"
['Output', '__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'argparse', 'default_dependency_file_path', 'delete_existing_files', 'generate_matrix', 'load_config_from_file', 'main', 'make_dependency_files', 'os', 'validate_args', 'version', 'warnings']

It's technically possible right now to do something like this:

from rapids_dependency_file_generator.cli import argparse

That should be discouraged, via an __all__ entry in src/rapids_dependency_file_generator/cli.py (and all the other submodules).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions