Skip to content

Public API#85

Open
nstarman wants to merge 1 commit into
astropy:mainfrom
nstarman:public-api
Open

Public API#85
nstarman wants to merge 1 commit into
astropy:mainfrom
nstarman:public-api

Conversation

@nstarman
Copy link
Copy Markdown
Member

@nstarman nstarman commented Apr 28, 2023

Up for discussion!

Very much a work in progress. Hopefully refined by discussion at the upcoming conference.

Copy link
Copy Markdown
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments/questions so far - one of the main things that bothers me in general and that I don't think there is a solution to is that I can do e.g.:

from astropy.cosmology.realizations import ScienceState

Obviously this isn't public API, and ScienceState isn't in __all__, but there's not real way to prevent users from relying on it, and we can't rename all imports in a module to include a _ prefix. So I think that we should probably also make it so that our API docs are also an authoritative source of public API and ensure that it's consistent with the other rules. We should also make sure that all public APIs are documented in the docs (we don't check that this is the case right now).

Comment thread APE_public.rst
Comment thread APE_public.rst Outdated
2. All modules must have an ``__all__`` attribute, even if it is empty. The
``__all__`` attribute defines the public and private interface of the module
in which it is defined. Anything in ``__all__`` is public, including
underscore-prefixed symbols. Anything not in ``__all__`` is private in that
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although I think we should probably disallow underscore-prefixed symbols in __all__?

Copy link
Copy Markdown
Member Author

@nstarman nstarman May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that practically there should not, but I strongly think __all__ should be definitive, so if we have an underscore-prefixed object that we want to make public the mandatory steps are (in this order):

  1. put it in __all__.
  2. update the docs to reflect __all__.
  3. super strongly encouraged to remove the underscore prefix (updating steps 1 & 2).

This is adopting the Scipy disambiguation of PEP 8, adding primarily the mandate that empty __all__ be included in modules with no Public API.

Comment thread APE_public.rst Outdated
Comment thread APE_public.rst Outdated
Comment thread APE_public.rst Outdated
- Clearly state if a documented object is actually private.
3. **Add prefixes**: 1. Add prefixes to all modules that are not public. 2. Add
prefixes to all classes, functions, and attributes that are not public.
:note:`I'm less enthusiastic on this point.`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of this as long as - as described above - we don't need to add underscore to symbols that are already in an underscore module for instance (so e.g. nothing in astropy.io.fits._tiled_compression needs to have an underscore prefix.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with that. I have reservations about going all in on underscore prefixing everything. Not needing to underscore symbols in private modules (which still have an __all__) SGTM.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astrofrog, resolve?

@astrofrog
Copy link
Copy Markdown
Member

astrofrog commented Apr 28, 2023

After further thought, I think what is going to be really important here is to define what public API is from the perspective of a user - that is, a user won't know what __all__ is, so when we communicate with a user, should we tell them that public API is anything in the API docs, or e.g. anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere?

@saimn
Copy link
Copy Markdown

saimn commented Apr 29, 2023

The problem with __all__ is that it only affects import * (and for a long time it was it's only meaning, the pep8 section about public/private interface was added later, python/peps@7dba60c). So it doesn't prevent importing from a module, and autocompletion doesn't use it.
So the only way to enforce public/private API is by renaming private modules with an underscore. That's what Scipy did.

@saimn
Copy link
Copy Markdown

saimn commented Apr 29, 2023

What do you propose for submodules that currently define __all__ but from which people should not import directly, e.g. astropy.convolution.core (and many others). This is the scheme that is mostly used currently in Astropy.

@astrofrog
Copy link
Copy Markdown
Member

astrofrog commented Apr 29, 2023

Maybe in that case .core should technically be private? (._core)

@nstarman
Copy link
Copy Markdown
Member Author

nstarman commented Apr 30, 2023

Maybe in that case .core should technically be private? (._core)

👍. That is the suggestion of PEP 8 -- and that .core should also have a blank __all__ = [].

@nstarman
Copy link
Copy Markdown
Member Author

nstarman commented Apr 30, 2023

What do you propose for submodules that currently define __all__ but from which people should not import directly, e.g. astropy.convolution.core (and many others). This is the scheme that is mostly used currently in Astropy.

So this would be one of the largest changes to Astropy. PEP8 and thus Scipy and static typing all say that __all__ refers to what is public in that module. With this in mind, doing

# __init__.py
# No __all__ is defined
from .core import *
# core.py
__all__ = ["Foo"]

class Foo: ...

Means that Foo is public in astropy.convolution.core and private in astropy.convolution. This is contrary to how Astropy intends, where we are saying __all__in astropy.convolution.core means that it is actually private in astropy.convolution.core and public in astropy.convolution. This is confusing for many reasons. If we were to adopt PEP8 (as suggested in this draft APE) then the previous example would look like

# __init__.py
__all__ = ["Foo"]  # Foo is public in this module, even if it is defined elsewhere.
from .core import Foo
# core.py
__all__ = []  # nothing is public in this module. Please look elsewhere.

class Foo: ...

Essentially we need to move the contents of various __all__ to where the code is actually public, leaving behind empty __all__ to indicate where no code is public.
Caveat private modules defining non-empty __all__ is fine. This enables *-imports in the public modules.
Thanks @astrofrog for the clarification, which is now detailed in the APE.

Update

The better option is to rename core.py to _core.py and add an __all__ to __init__ like so.

# __init__.py
from . import _core, ...
from ._core import *
...

__all__ = [] + _core.__all__ + ...
# _core.py (formerly core.py)
__all__ = ["Foo"]

class Foo: ...

This still supports * imports if you want them and retains 100% unambiguity about what is public and where.

@nstarman
Copy link
Copy Markdown
Member Author

nstarman commented Apr 30, 2023

After further thought, I think what is going to be really important here is to define what public API is from the perspective of a user - that is, a user won't know what all is, so when we communicate with a user, should we tell them that public API is anything in the API docs, or e.g. anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere?

I think Yes. To make sure we're on the same page, I think communicating it this way to users should be the logical consequence of the deeper rules:

  • That __all__ is authoritative
  • That we follow PEP 8 public versus internal interfaces, e.g. with undercore prefixes
  • That the docs are up-to-date.

Having all these rules means that a user can only get to public symbols though other public symbols -- that "anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere" and "anything in the API docs" (unless explicitly stated) is public. It's not the source or our definition of public API, it's the consequence.

@nstarman
Copy link
Copy Markdown
Member Author

nstarman commented Apr 30, 2023

The problem with all is that it only affects import * (and for a long time it was it's only meaning, the pep8 section about public/private interface was added later, python/peps@7dba60c). So it doesn't prevent importing from a module, and autocompletion doesn't use it.
So the only way to enforce public/private API is by renaming private modules with an underscore. That's what Scipy did.

I agree, __all__ is not enough to prevent autocomplete, though some autocomplete does use __all__: see https://ipython.readthedocs.io/en/stable/config/options/terminal.html#configtrait-IPCompleter.limit_to__all__.
I also agree we should rename modules with an underscore, as part of adhering to PEP 8.
It should be noted that adding underscores does not actually "enforce" public/private API as Python does not have true language-level features for public vs internal interfaces. Like __all__, single underscores are convention and Scipy says that __all__ takes precedence over underscores. In this APE I propose that we adopt a Scipy-like rule set where __all__ takes precedence over underscores and we use both according to PEP 8.

@saimn
Copy link
Copy Markdown

saimn commented May 1, 2023

@astrofrog - Maybe in that case .core should technically be private? (._core)

If we want to control strictly what's public and private yes, basically all submodules should be private (renamed with underscore) and public API exported in the subpackages' __init__.py. That's what Scipy did.

@nstarman - So this would be one of the largest changes to Astropy.

With this solution yes, this would require a lot of changes, and may be painful to do. So I don't like this solution.
But you also seem to agree with renaming with underscores, though I think those are two different solutions.

To summarize:

  • Option 1: use __all__ = [] in submodules, and import explicitly public functions/classes in subpackages' __init__.py and list those in __all__.
  • Option 2: rename submodules with underscore, keep their list of public functions/classes in __all__ (a lot of them already have it) and just change the import in subpackages' __init__.py (from .module import *from ._module import *). That's what Scipy did. [1]

I don't like option 1 because it moves the list of exported functions from the module itself to another place, and requires a lot of changes which can be prone to errors. Option 2 is more reasonable.

Then as you say, the underscore prefix is also just a convention, but that's the closest thing to a private scope in Python's land. And autocomplete respect it, so when users browse the functions in their shell they will see only public API. As a users I never checked the content of __all__ of a package, I use autocompletion and the docs.

[1] They also kept module.py with deprecation warnings because there is a lot of code using import from e.g. scipy.ndimage.morphology instead of scipy.ndimage. We may want to do that in specific cases, but I don't think we would need it to do that in a systemic way.

@pllim pllim changed the title Public API APE 22: Public API May 2, 2023
@eteq
Copy link
Copy Markdown
Member

eteq commented May 2, 2023

(Some of this developed from discussion at the coordination meeting (including @nstarman, @saimn , @astrofrog , @tepickering, @pllim, @nden, @WilliamJamieson), although I don't think I can say that all of my points above are consensus of those folks, it's some mix of that and just straight up my own opinion.)

I very much agree with @astrofrog's point here:

After further thought, I think what is going to be really important here is to define what public API is from the perspective of a user - that is, a user won't know what all is, so when we communicate with a user, should we tell them that public API is anything in the API docs, or e.g. anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere?

Which I think has up until this point (in an uncodified way) is that whatever the docs say is the public API. So I think it makes sense to codify that as the "true answer". @nstarman's point was that if we follow the rules here, it's the same between those, and that it's only an aberration if these are different. But I was/am concerned about the inevitable state when something isn't working right. So I think what we settled on is that we start by saying as of when this APE is accepted, the docs are the "true" public API, but this APE presents a plan to get to a state where the rules highlighted in this APE lead naturaly to the docs just reflecting the same thing as these rules.

I still personally think we should have it true that the final source of authority is the documentation, because that's more user-facing of a contract, as @astrofrog says. But I think if we say that's the reality now, and we might re-visit it after this APEs plan is implemented, that's a reasonable compromise.

Two more opinions to offer:

  • A consequence of this is that modules like astropy/quantity/quantity.py become astropy/quantity/_quantity.py. I think leading underscore module names are ugly. That's subjective, but still.
  • Another consequence is that the actual public API __all__ s are in different files than the thing-to-be-made-public E.g., Quantity would be in astropy/quantity/__init__.py instead of astropy/quantity/_quantity.py. I don't like that because it means a small change in one file requires one to understand the full API structure to know which __all__ to add it to. I'm not sure that's annoying enough to justify changing anything, but it's a complaint I want to register and think about how we might get around it.

And one question:

  • Does this apply to coordinated packages in addition to the core? In principle it should, but that might be signing up for a lot more work because they are probably more of a mess than the core...

@pllim
Copy link
Copy Markdown
Member

pllim commented May 3, 2023

Does this apply to coordinated packages

I would say not. Even for things like removing astropy-helpers, it was a tedious campaign with writing up a transition guide and opening some PRs downstream. For things like this that would break API, it is a non-starter.

@nstarman nstarman requested a review from astrofrog July 31, 2023 00:44
@nstarman
Copy link
Copy Markdown
Member Author

@astrofrog @eteq @saimn @pllim, I've updated this APE based on the discussions we had at the conference an in this thread. LMK what you think!

@nstarman
Copy link
Copy Markdown
Member Author

nstarman commented Aug 1, 2023

What do you propose for submodules that currently define __all__ but from which people should not import directly, e.g. astropy.convolution.core (and many others). This is the scheme that is mostly used currently in Astropy.

@saimn I've been working on this over at astropy.cosmology. We've successfully transitioned .utils -> ._utils, io -> ._io, and I'm working on the rest. The code is clearer from a user's perspective since there's only one obvious place to import thing from and all the hidden modules aren't tab-completion discoverable.

@nstarman
Copy link
Copy Markdown
Member Author

nstarman commented Aug 1, 2023

  • Option 2: rename submodules with underscore, keep their list of public functions/classes in __all__ (a lot of them already have it) and just change the import in subpackages' __init__.py (from .module import *from ._module import *). That's what Scipy did. [1]

I also like Option 2 a lot. It works because _module is not made public in __init__, so even though _module defines an __all__ and makes it's contents locally / contextually public that is within a private module. It's impossible to publicly navigate to the contents of _module, only what is exported to __init__.

With this as the template, the example from #85 (comment) becomes

# __init__.py
__all__ = ["Foo"]  # Foo is public in this module, even if it is defined in `_core`, which is private.
from ._core import Foo
# _core.py
__all__ = ["Foo"]  # Foo is "public" in this module, but this module is private.

class Foo: ...

Another consequence is that the actual public API all s are in different files than the thing-to-be-made-public E.g., Quantity would be in astropy/quantity/init.py instead of astropy/quantity/_quantity.py. I don't like that because it means a small change in one file requires one to understand the full API structure to know which all to add it to. I'm not sure that's annoying enough to justify changing anything, but it's a complaint I want to register and think about how we might get around it.

@eteq, I believe @astrofrog's comment largely answers this question.

@nstarman
Copy link
Copy Markdown
Member Author

nstarman commented Aug 1, 2023

So I think what we settled on is that we start by saying as of when this APE is accepted, the docs are the "true" public API, but this APE presents a plan to get to a state where the rules highlighted in this APE lead naturaly to the docs just reflecting the same thing as these rules.

@eteq, I agree.

But I was/am concerned about the inevitable state when something isn't working right.

I added a section on a pre-commit CI check. It actually looks to be fairly simple to check that a public module has corresponding documentation since we have docs/api that collects our documented objects. I believe we can go further and make a two-way check to also check that something in docs/api is also in __all__.
Given all this, we can make it 🤞 impossible for the docs to not reflect the public API as defined in the code.

Copy link
Copy Markdown
Member Author

@nstarman nstarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments.

Comment thread APE_public.rst
not have a uniform and systematic approach to communicating what is public vs
internal, then we cannot expect users and especially their tools to know what is
public vs internal.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From astropy/astropy#15169, I recall another source of issues that has bitten astropy in the past: I/O registration. When deep sub-modules look public, e.g. astropy.cosmology.flrw.lambdacdm.LambdaCDM then that is generally used as the class pathway in I/O rather than the better astropy.cosmology.LambdaCDM. When the class is moved it breaks the I/O and needs a whole backwards-compatibility logic to support files using the previous path. Cleaning up public vs internal, e.g. to astropy.cosmology._flrw.lambdacdm.LambdaCDM would make this an obviously bad way to serialize the class and code authors would get it "right" (astropy.cosmology.LambdaCDM) the first time. Clear public API makes for more stable code.

probably deserves a few sentences.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly orthogonal to the issue here: really, one needs to set __module__ to avoid this - registration just followed that, not a choice made by coders. Note that numpy in fact does this - which partially is annoying, since one no longer knows where the function is defined.

Copy link
Copy Markdown
Member Author

@nstarman nstarman Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the pain point was YAML serialization because I used f"{class.__module__}.{class.__qualname__}", which looks reasonable as code, but led to the problems described above. If, when making the tests, I had noticed a private path I would have fixed the problem from the start.

I believe this applies more generally, if something is defined in a private module but imported to a public location then it may be imported from there again, e.g. as part of I/O. Serializing using only the public import location is much better than what I did, which seemed reasonable at the time. Setting __module__ is one way to force public locations in serialization (which I agree has issues); a path map, or str regex / replace operation are other means.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that one is an issue for examples and sphinx too - I really dislike that when we made representations a module (which was definitely a good idea!), we had to make so many updates to the docs. It would also be nice to use regex for things like __construct_mixin_classes

For me, this is still separate from the APE, since this has annoyed me amply already! But I can see that coming from an environment where files with leading underscores have more meaning, you could have avoided the issues.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed we can leave it out of this APE. I think the point I made "If, when making the tests, I had noticed a private path I would have fixed the problem from the start." means adopting this APE will automatically force most of these I/O issues to be fixed

Comment thread APE22.rst Outdated
Comment thread APE22.rst Outdated
Comment thread APE22.rst Outdated
Comment thread APE22.rst Outdated
Comment thread APE_public.rst Outdated
Comment thread APE_public.rst
@mhvk
Copy link
Copy Markdown
Contributor

mhvk commented Aug 14, 2023

Thanks for writing this! I think it is good to describe what formalizing the status quo would entail:

  1. Any module from which users can import must have an __all__ (i.e., all subpackages, and some of the sub-modules of astropy.utils [which are all documented to be only semi-public]).
  2. Any module not listed in the __all__ above is implicitly private.

Implementing this is very little work and would give consistency, without breaking anything.

A few more general points:

  1. One should not needlessly break user code, even if that code "incorrectly" imports from nominally private files. I try to heed @taldcroft's advice to really avoid that. Does being "more correct" outweigh this?
  2. For developers there is also considerable value to not changing things, because one knows by heart how to get to given files -- I open them by typing (of course) and what is the benefit of me having to retrain and add underscores that do not help tab-completion?
  3. If I think of subpackages I maintain, like astropy.time and astropy.table, essentially all files would start with underscores. In astropy.units, very little except for the unit-defining modules would be public. I find astropy/units/_quantity.py, astropy/units/_function/_logarithmic.py and astropy/units/_quantity_helpers/_function_helpers.py needlessly complicated.
  4. We need to be realistic of the work involved. Conservatively, including reviews, I'd estimate 1 full month to actually do it (excluding time for this discussion, etc.). Plus an unknown amount of time of users dealing with broken scripts/packages that used to work.
  5. While scipy and numpy may be moving (partially because they had real historical baggage), others do not seem to (e.g., pandas).

Overall, I'm fairly strongly against this. But being consistent within astropy is more important, so if the consensus is to move forward, I'll do my bit for the transition.

@nstarman
Copy link
Copy Markdown
Member Author

nstarman commented Aug 19, 2023

  1. Any module from which users can import must have an __all__ (i.e., all subpackages, and some of the sub-modules of astropy.utils [which are all documented to be only semi-public]).
  2. Any module not listed in the __all__ above is implicitly private.
    Implementing this is very little work and would give consistency, without breaking anything.

@mhvk, if I understand this point correctly, this is similar to point 2 of the implementation section.

2. **Add / update** ``__all__``. The ``__all__`` in each module will be updated
   to reflect phase 1. Any modules' missing ``__all__`` will have one added.

I think where this APE differs is that it aims to be explicit everywhere and not have anything be implicitly private (or public).
For both users and maintainers IMO explicit is better than implicit.
While the transition to this explicit state is somewhat arduous, as you noted in astropy/astropy#15169 and in the time estimate below, once accomplished, this APE attempts to make remaining in that state vey easy: through clear rules and CI checks. (Comments appreciated to make this this APE more clear / have better CI checks.)

A few more general points:

  1. One should not needlessly break user code, even if that code "incorrectly" imports from nominally private files. I try to heed @taldcroft's advice to really avoid that. Does being "more correct" outweigh this?

This is definitely true in general. For me, the benefits outweigh the costs for 3 reasons:

  1. In the user code it's easy to switch the import from the incorrect private file to the correct public location since the public location is guaranteed to exist (it's guaranteed to exist because the user is using a public symbol, just importing it from the wrong location), e.g. units.quantity.Quantity -> units.Quantity.
  2. For select items, or for everything, we can support the nominally private files using __getattr__. See https://github.com/astropy/astropy/blob/main/astropy/cosmology/utils.py for an example. This allows for a deprecation period.
  3. Maintainers do move around private files, as is their right and prerogative. Sometimes that breaks people's workflows. That sucks, so we try to minimize the damage. What if public vs private were obvious? Then we'd never break anyone's workflow when we refactored private code (so long as they used public API). No more damage. The very point that this APE would mildly break people's code now is proof positive that we should make the change for the future.
  1. For developers there is also considerable value to not changing things, because one knows by heart how to get to given files -- I open them by typing (of course) and what is the benefit of me having to retrain and add underscores that do not help tab-completion?

True. Python's PEP-8 recommendations for how to structure a module to signal public vs private API is not wonderful for people using tab-completion in a terminal-based text editor. However, the tab-completion problem for developers is actually a feature for users, because they won't see private API.
Also, while VIM, nano, emacs, etc. are great, the limitation you mention does not apply to IDEs like Sublime Text, Nova, VSCode, etc. We should aim to support many ways to develop, of course, but...

(is there a way to set up tab-completion aliases so that the old paths might point to the new ones on a local macine? A quick google found https://www.gnu.org/software/bash/manual/html_node/Programmable-Completion.html, indicating such things are possible and their might be convenient tools to set this up for devs that use terminal-based editors)

  1. If I think of subpackages I maintain, like astropy.time and astropy.table, essentially all files would start with underscores. In astropy.units, very little except for the unit-defining modules would be public. I find astropy/units/_quantity.py, astropy/units/_function/_logarithmic.py and astropy/units/_quantity_helpers/_function_helpers.py needlessly complicated.

So astropy/units/_function/_logarithmic.py would actually be astropy/units/_function/logarithmic.py (note logarithmic is not underscored).
Likewise astropy/units/_quantity_helpers/_function_helpers.py -> astropy/units/_quantity_helpers/function_helpers.py.

To the broader point, top-level modules, e.g. astropy.time with flat structures would have lots of underscores.

astropy/module/
    file1.py
    file2.py
    file3.py
    file4.py
    private_submodule/
        subfile1.py

becomes

astropy/module/
    _file1.py
    _file2.py
    _file3.py
    _file4.py
    _private_submodule
        subfile1.py

There is an alternative, which is to group related components into sub-modules.

astropy/module/
    _file1.py
    _a_logical_grouping/
        file2.py
        file3.py
        file4.py
    _private_submodule
        subfile1.py

Whether this happens is of course up to the maintainers of each module. Personally I like hierarchical organization as it conveys information about a file by dint of its location.

  1. We need to be realistic of the work involved. Conservatively, including reviews, I'd estimate 1 full month to actually do it (excluding time for this discussion, etc.). Plus an unknown amount of time of users dealing with broken scripts/packages that used to work.

Sounds like a reasonable estimate.

  1. While scipy and numpy may be moving (partially because they had real historical baggage), others do not seem to (e.g., pandas).

We are more closely aligned with scipy and numpy than pandas, but your point about historical baggage stands.
Arguably we have historical baggage, as you mention in point 1, since we occasionally break people's code when they are using public code but importing it from a private location.

Overall, I'm fairly strongly against this. But being consistent within astropy is more important, so if the consensus is to move forward, I'll do my bit for the transition.

Fair enough! Thanks for the open mind and valuable discussion!

@nstarman nstarman closed this Aug 19, 2023
@nstarman nstarman reopened this Aug 19, 2023
@nstarman
Copy link
Copy Markdown
Member Author

nstarman commented May 31, 2024

Note: if we take cues from "upstream" packages, scipy now has underscore-prefixed names for modules.

@pllim
Copy link
Copy Markdown
Member

pllim commented Jun 21, 2024

This was discussed as part of "State of APEs" at Coordination Meeting 2024. I think reactions were mixed and I cannot see any clear action items on how to move this forward (or if we should).

@astrofrog
Copy link
Copy Markdown
Member

One idea I raised was that at the very least if we cannot reach consensus on changing current code, we should see if we can agree on rules for any new code?

@pllim
Copy link
Copy Markdown
Member

pllim commented Jun 21, 2024

Re: #85 (comment)

For completeness, my response to that idea in the meeting was that if it is only recommendation for new code, I do not think we need an APE, but rather we can modify the dev docs.

@mhvk
Copy link
Copy Markdown
Contributor

mhvk commented Jun 22, 2024

There was indeed no consensus on the underscore prefixes, in large part because, contrary to what I thought at least, things like from astropy.units.quantity import Quantity were widespread in other github repositories. Hence, changing to underscores is guaranteed to break quite a bit of downstream code, and it is not clear this is worth it.

There also seemed to be consensus that in the end the documentation should be the ultimate arbiter, since that is what users would normally see (and the mistake of documenting the quantity submodule is probably at least partially to blame for the wrong usages...). So, it remains a good idea to ensure we document what is public and private, but start incrementally, as suggested in #85 (review):

  1. We explicitly document current practice that everything under subpackages is private and add a corresponding comment in all their top level __init__.py files (making appropriate exceptions in io and utils).
  2. We add __all__ to all subpackage __init__.py files that include the public items, including public submodules of the subpackages.
  3. We slowly add __all__ to the rest of astropy, to indicate to ourselves which parts are meant to be used outside a given module.

Finally, I'd say there was no consensus either on new vs old code, or subpackages doing different things, with the latter having the advantage of maintainers being able to set a policy they feel is best, but the disadvantage that then there is no package-wide logic anymore at all, while currently there is (with cosmology the only exception).

@nstarman
Copy link
Copy Markdown
Member Author

Hence, changing to underscores is guaranteed to break quite a bit of downstream code, and it is not clear this is worth it.

from astropy.units.quantity is private. IMO we might as well break usage of private code in one fell swoop and then not (ever?) again rather than do it piecemeal as publicly-visible-private-code is changed over the years. I would find that less disruptive.

@mhvk
Copy link
Copy Markdown
Contributor

mhvk commented Jun 29, 2024

from astropy.units.quantity is private. IMO we might as well break usage of private code in one fell swoop and then not (ever?) again rather than do it piecemeal as publicly-visible-private-code is changed over the years. I would find that less disruptive.

It is private, indeed. But the feeling at the coordination meeting was that breaking people's code for code style purity is too big a price to pay. And it is not likely we would ever define Quantity in another place than astropy.units.quantity. I also think the issue may be moot sooner or later, since I do think there will eventually be a general units/quantity package that we are going to be based on (hopefully by combining our units machinery with Quantity 2.0!).

But for astropy as a whole, continuity and consistency are important too. But nothing stopping us from making it clearer what is public and not, by defining appropriate __all__ and ensuring that, unlike for units, "private" modules do not appear in the documentation unless strictly necessary, and then with a clear docstring that states why they are included.

@nstarman
Copy link
Copy Markdown
Member Author

nstarman commented Jun 29, 2024

since I do think there will eventually be a general units/quantity package that we are going to be based on (hopefully by combining our units machinery with Quantity 2.0!)

🎉. That would be excellent.

But nothing stopping us from making it clearer what is public and not, by defining appropriate __all__

A great thing to do, no matter the outcome of this APE.


When first proposed, one of the counter-arguments was that "upstream" libraries haven't done this. But now both numpy and scipy have basically done this (slightly different implementations).
And they did it in one fell swoop (numpy 2, recent scipy), so that users didn't suffer multiple falls from repeated swoops.
I'm just wondering why we're different. Same problem, similar solution?

When I first wrote this APE it was to make an argument "why we should do this". Now that our upstreams have done the same thing, IMO the argument shifts to "why aren't we doing this?". Prima facie we should.

Documentation is important, but it is most certainly not how any of our upstream libraries define their public API. The point of public-facing documentation is to document what is public, not to make it public. Just like how we generate documentation from docstrings (prioritizing that the code contains its on documentation) so too does Python, our upstream libraries, and most everyone else makes it so that public/private is a product of the code, not imposed on it.
And this understanding is intrinsic to how we've built tooling for Astropy, like sphinx-automodapi: it looks at __all__.

@neutrinoceros
Copy link
Copy Markdown

And they did it in one fell swoop (numpy 2, recent scipy), so that users didn't suffer multiple falls from repeated swoops.
I'm just wondering why we're different. Same problem, similar solution?

I see 3 options here
a) moving private code to private modules over one swoop
b) moving private code to private modules piecewise
c) do nothing1

I agree with @nstarman that a>b. I also agree with Marten c>b. The remaining question is how to compare a VS c.

Since, as you guys pointed out, we may eventualy have to move part the private code from astropy.units in response to Quantity 2.0 becoming a dependency, why not use that event as a pivot to switch our strategy to a, and keep status quo (c) in the mean time ?

Footnotes

  1. I'm only speaking about moving modules/members around here. Defining __all__ is a separate discussion and one that seems more consensual anyway.

@mhvk
Copy link
Copy Markdown
Contributor

mhvk commented Jun 30, 2024

Numpy was in a different state, with, e.g., some parts of np.lib being public, while other parts were not, so there was more urgency than we have. Even so, they held off to numpy 2.0, where a lot of other stuff was broken too.

Overall, @neutrinoceros made the right list, and I guess the conclusion in the coordination meeting was that c>a at the present time. At a time when there is a larger API change (as Quantity 2.0 would be), the conclusion may well be different.

In the meantime, there's nothing stopping us from incrementally ensuring that docstrings and __all__ are all consistent and clear.

@pllim
Copy link
Copy Markdown
Member

pllim commented Jul 1, 2024

Also keep in mind that NumPy has the backing of private industry (e.g., NVidia). Astronomy does not. I have started seeing pipelines pinning numpy<2 privately just because they have larger fish to fry and no time to deal with breaking API here and there. To them, calibration accuracy and stability is way more important than whether astropy.units.quantity is private or not. We have to keep our main "customers" in mind and they are not "big money".

Copy link
Copy Markdown
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we could consider doing in any case would be for common cases of misuse (such as astropy.units.quantity) we could use batchpr to search through GitHub and open PRs to fix these incorrect imports.

It's also worth considering another option d, which is to do The Right Thing ™️ for new sub-packages and code, to at least not increase the problem.

@pllim
Copy link
Copy Markdown
Member

pllim commented Dec 31, 2024

xref astropy/astropy#17589

@kbwestfall
Copy link
Copy Markdown
Contributor

With the acceptance of the revisions to APE 1, we're going to start following the new APE proposal process. While all of the APE editors (for now exactly the same as the CoCo) will contribute to the process, one of the APE editors will be the point-person (chief APE editor) shepherding your team through the process.

Your chief APE editor is Erik Tollerud (@eteq).

You can review the next steps in the APE process here, but just to briefly summarize them here:

  • Please finalize the draft of your APE and then ping @astropy/ape-editor-team. This will prompt the APE Editorial team to read the proposed APE and provide any basic editing comments (led by your chief editor).
  • Following APE 1, this editing process will take no more than a month, but we actually expect this to go much faster for this APE.
  • Once the basic editing is finished, APE 1 provides a detailed list of the next steps. The main thing to be aware of is that the APE status will be set to Discussion, the PR will be merged, and the APE editor will send an e-mail to astropy-dev to start the discussion period. All discussion should happen on the astropy-dev list.
  • Throughout the discussion period, it is up to the APE authors to track the discussion and make updates to the APE. These changes can be committed in a series of new PRs or kept in a branch that only leads to a single PR after the discussion has reached consensus. Please keep your APE editor contact updated on your preferred approach.
  • There is no set timeframe for the discussion period, but it should be at least 2 weeks.
  • After this minimum discussion period but otherwise at their discretion, the APE authors must then notify the CoCo that the APE proposal is ready for a final decision.
  • There is no set period for the CoCo review, but we expect it will take no more than 2-3 weeks.
  • The CoCo will decide to accept, reject, or request more discussion on specific items in the proposal, which will restart the process from the start of the discussion period.

Please let us (either your chief APE editor or the full CoCo) know if you have any questions!

Copilot AI review requested due to automatic review settings January 25, 2026 21:47
@nstarman nstarman force-pushed the public-api branch 2 times, most recently from 1b3bbcf to efa7588 Compare January 25, 2026 21:48
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new APE document defining Astropy’s policy for what constitutes the public versus internal API, with an emphasis on consistency between __all__, naming conventions, and documentation, and on making the API machine-detectable for tools.

Changes:

  • Adds a full APE text describing the problem of ambiguous public/private interfaces in Python and within Astropy.
  • Proposes concrete rules around __all__, underscore prefixes, documentation requirements, and “locally public” vs. truly public symbols, including worked examples and a summary table.
  • Outlines a phased implementation plan (documentation snapshot, __all__ updates, deprecations, tooling, and naming changes) plus examples from Astropy, NumPy, and SciPy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread APE_public.rst Outdated
Comment thread APE_public.rst Outdated
Comment thread APE_public.rst Outdated
Comment thread APE_public.rst Outdated
Comment thread APE_public.rst Outdated
Comment thread APE_public.rst
Comment thread APE_public.rst Outdated
@nstarman nstarman force-pushed the public-api branch 4 times, most recently from 0085a63 to 2ef3062 Compare January 25, 2026 22:07
Signed-off-by: nstarman <nathanielstarkman@gmail.com>
Co-authored-by: Clément Robert <cr52@protonmail.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
@nstarman
Copy link
Copy Markdown
Member Author

@eteq I've updated this APE.
In particular, astropy/cosmology now implements this APE and serves as good example. Also numpy and scipy basically implement this APE, which they hadn't yet done when this APE was written.

@nstarman nstarman changed the title APE 22: Public API Public API Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.